Thread: What is wrong with hashed index usage?

What is wrong with hashed index usage?

From

"Dann Corbit"

Date:

22 April 2002, 17:16:11

From here:
http://osdb.sourceforge.net/
We find this quote:
"For you long-suffering OSDB PostgreSQL users, we offer

--postgresql=no_hash_index

to work around the hash index problems of OSDB with PostgreSQL V7.1 and
7.2. As always, let us know of any problems. May the source be with
you!"

Does anyone know what the above is all about?

Re: What is wrong with hashed index usage?

From

Neil Conway

Date:

22 April 2002, 17:59:19

On Mon, 22 Apr 2002 14:15:37 -0700
"Dann Corbit" <DCorbit@connx.com> wrote:
> From here:
> http://osdb.sourceforge.net/
> We find this quote:
> "For you long-suffering OSDB PostgreSQL users, we offer 
> 
> --postgresql=no_hash_index 
> 
> to work around the hash index problems of OSDB with PostgreSQL V7.1 and
> 7.2. As always, let us know of any problems. May the source be with
> you!"
> 
> Does anyone know what the above is all about?

Yes -- search the list archives, or check the PostgreSQL docs. This problem
has been brought up several times: hash indexes deadlock under concurrent
load. A run of pgbench with a reasonably high concurrency level (10 or 15)
produces the problem consistently.

Previously, I had volunteered to fix this, but
   (a) I'm busy with the PREPARE/EXECUTE stuff at the moment.
   (b) I'm not sure it's worth the investment of time: AFAIK,       hash indexes don't have many advantages over btrees
for      scalar data.

On the other hand, if someone steps forward with some data on a
specific advantage that hash indexes have over btrees, I don't
expect that the concurrency problems should be too difficult to
solve.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: What is wrong with hashed index usage?

From

"Dann Corbit"

Date:

22 April 2002, 18:05:37

> -----Original Message-----
> From: Neil Conway [mailto:nconway@klamath.dyndns.org]
> Sent: Monday, April 22, 2002 2:59 PM
> To: Dann Corbit
> Cc: pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> On Mon, 22 Apr 2002 14:15:37 -0700
> "Dann Corbit" <DCorbit@connx.com> wrote:
> > From here:
> > http://osdb.sourceforge.net/
> > We find this quote:
> > "For you long-suffering OSDB PostgreSQL users, we offer
> >
> > --postgresql=no_hash_index
> >
> > to work around the hash index problems of OSDB with
> PostgreSQL V7.1 and
> > 7.2. As always, let us know of any problems. May the source be with
> > you!"
> >
> > Does anyone know what the above is all about?
>
> Yes -- search the list archives, or check the PostgreSQL
> docs. This problem
> has been brought up several times: hash indexes deadlock
> under concurrent
> load. A run of pgbench with a reasonably high concurrency
> level (10 or 15)
> produces the problem consistently.
>
> Previously, I had volunteered to fix this, but
>
>     (a) I'm busy with the PREPARE/EXECUTE stuff at the moment.
>
>     (b) I'm not sure it's worth the investment of time: AFAIK,
>         hash indexes don't have many advantages over btrees for
>         scalar data.
>
> On the other hand, if someone steps forward with some data on a
> specific advantage that hash indexes have over btrees, I don't
> expect that the concurrency problems should be too difficult to
> solve.

Here is where a hashed index shines:
To find a single item using a key, hashed indexes are enormously faster
than a btree.

That is typically speaking.  I have not done performance benchmarks with
PostgreSQL.

In general, hashed indexes are much to be preferred when you are doing
frequent keyed lookups for single items.  Hashed indexes are (of course)
completely useless for an ordered scan or for wide ranges of continuous
data.

Re: What is wrong with hashed index usage?

From

Neil Conway

Date:

22 April 2002, 18:13:51

On Mon, 22 Apr 2002 15:04:22 -0700
"Dann Corbit" <DCorbit@connx.com> wrote:
> Here is where a hashed index shines:
> To find a single item using a key, hashed indexes are enormously faster
> than a btree.
> 
> That is typically speaking.  I have not done performance benchmarks with
> PostgreSQL.

Yes -- but in the benchmarks I've done, the performance different
is not more than 5% (for tables with ~ 600,000 rows, doing lookups
based on a PK with "="). That said, my benchmarks could very well
be flawed, I didn't spend a lot of time on it. If you'd like to
generate some interest in improving hash indexes, I'd like to see
some empirical data supporting your performance claims.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: What is wrong with hashed index usage?

From

Michael Loftis

Date:

22 April 2002, 19:48:03

The benchmarks will depend mostly on the depth of the Btree.   Hashes 
will be markedly faster only in the case(s) where descending into the 
tree to produce a matching leaf node would take longer than walking to 
the appropriate item in a hash.

Most of the time until the btree gets deep they are nearly equivalent. When the tree ends up becoming many levels deep
itcan take longer to 

walk than the hash.

Neil Conway wrote:

>On Mon, 22 Apr 2002 15:04:22 -0700
>"Dann Corbit" <DCorbit@connx.com> wrote:
>
>>Here is where a hashed index shines:
>>To find a single item using a key, hashed indexes are enormously faster
>>than a btree.
>>
>>That is typically speaking.  I have not done performance benchmarks with
>>PostgreSQL.
>>
>
>Yes -- but in the benchmarks I've done, the performance different
>is not more than 5% (for tables with ~ 600,000 rows, doing lookups
>based on a PK with "="). That said, my benchmarks could very well
>be flawed, I didn't spend a lot of time on it. If you'd like to
>generate some interest in improving hash indexes, I'd like to see
>some empirical data supporting your performance claims.
>
>Cheers,
>
>Neil
>

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

23 April 2002, 19:55:02

Michael Loftis wrote:
> The benchmarks will depend mostly on the depth of the Btree.   Hashes 
> will be markedly faster only in the case(s) where descending into the 
> tree to produce a matching leaf node would take longer than walking to 
> the appropriate item in a hash.
> 
> Most of the time until the btree gets deep they are nearly equivalent. 
>  When the tree ends up becoming many levels deep it can take longer to 
> walk than the hash.

And what causes the btree to get deep?  Is it just the number of rows in
the index?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

25 April 2002, 16:06:44

Michael Loftis <mloftis@wgops.com> writes:
> [ on hash vs btree indexing ]
> Most of the time until the btree gets deep they are nearly equivalent. 
> When the tree ends up becoming many levels deep it can take longer to 
> walk than the hash.

Maybe.  I've just completed a simple benchmark of btree vs hash indexes
as implemented in Postgres, and I can't see any advantage.

Using current sources on Red Hat Linux 7.2, I built a simple test table
containing one integer column, and filled it with 16 million random
integers generated by int4(1000000000 * random()).  With a btree index,
"explain analyze select * from foo where f1 = 314888455" (matching a
single row of the table) took about 22 msec on first try (nothing in
cache), and subsequent repetitions about 0.11 msec.  With a hash index,
the first try took about 28 msec and repetitions about 0.15 msec.
Moreover, the hash index was a whole lot bigger: main table size 674
meg, btree 301 meg, hash 574 meg, which possibly offers part of the
explanation for the greater access time.

I would have tried a larger test case, but this one already taxed
my patience: it took 36 hours to build the hash index (vs 19 minutes
for the btree index).  It looks like hash index build has an O(N^2)
performance curve --- the thing had 100 meg of hash index built within
an hour of starting, but got slower and slower after that.

In short, lack of support for concurrent operations is hardly the
worst problem with Postgres' hash indexes.  If you wanna fix 'em,
be my guest ... but I think I shall spend my time elsewhere.
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Michael Loftis

Date:

25 April 2002, 16:19:38


Tom Lane wrote:

>Michael Loftis <mloftis@wgops.com> writes:
>
>>[ on hash vs btree indexing ]
>>Most of the time until the btree gets deep they are nearly equivalent. 
>>When the tree ends up becoming many levels deep it can take longer to 
>>walk than the hash.
>>
>
>Maybe.  I've just completed a simple benchmark of btree vs hash indexes
>as implemented in Postgres, and I can't see any advantage.
>
>Using current sources on Red Hat Linux 7.2, I built a simple test table
>containing one integer column, and filled it with 16 million random
>integers generated by int4(1000000000 * random()).  With a btree index,
>"explain analyze select * from foo where f1 = 314888455" (matching a
>single row of the table) took about 22 msec on first try (nothing in
>cache), and subsequent repetitions about 0.11 msec.  With a hash index,
>the first try took about 28 msec and repetitions about 0.15 msec.
>Moreover, the hash index was a whole lot bigger: main table size 674
>meg, btree 301 meg, hash 574 meg, which possibly offers part of the
>explanation for the greater access time.
>
>I would have tried a larger test case, but this one already taxed
>my patience: it took 36 hours to build the hash index (vs 19 minutes
>for the btree index).  It looks like hash index build has an O(N^2)
>performance curve --- the thing had 100 meg of hash index built within
>an hour of starting, but got slower and slower after that.
>
>In short, lack of support for concurrent operations is hardly the
>worst problem with Postgres' hash indexes.  If you wanna fix 'em,
>be my guest ... but I think I shall spend my time elsewhere.
>
I said can, no will.  The particular btree implementation dictates what 
sorts of operations become bogged down.  I do agree that in pretty much 
every case, a well implemented btree will be better than a hash though. I don't know about PGs implementation but since
Iassume oyu all 
 
inhereted atleast part of it from the berkely boys you should be in very 
solid form.

>
>            regards, tom lane
>

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

25 April 2002, 16:38:17

Nice report.  I think we should start thinking of hiding the hash option
from users, or warn them more forcefully, rather than hold it out as a
possible option for them.

People think hash is best for equals-only queries, and btree for others,
and we can now see this clearly isn't the case.

---------------------------------------------------------------------------

Tom Lane wrote:
> Michael Loftis <mloftis@wgops.com> writes:
> > [ on hash vs btree indexing ]
> > Most of the time until the btree gets deep they are nearly equivalent. 
> > When the tree ends up becoming many levels deep it can take longer to 
> > walk than the hash.
> 
> Maybe.  I've just completed a simple benchmark of btree vs hash indexes
> as implemented in Postgres, and I can't see any advantage.
> 
> Using current sources on Red Hat Linux 7.2, I built a simple test table
> containing one integer column, and filled it with 16 million random
> integers generated by int4(1000000000 * random()).  With a btree index,
> "explain analyze select * from foo where f1 = 314888455" (matching a
> single row of the table) took about 22 msec on first try (nothing in
> cache), and subsequent repetitions about 0.11 msec.  With a hash index,
> the first try took about 28 msec and repetitions about 0.15 msec.
> Moreover, the hash index was a whole lot bigger: main table size 674
> meg, btree 301 meg, hash 574 meg, which possibly offers part of the
> explanation for the greater access time.
> 
> I would have tried a larger test case, but this one already taxed
> my patience: it took 36 hours to build the hash index (vs 19 minutes
> for the btree index).  It looks like hash index build has an O(N^2)
> performance curve --- the thing had 100 meg of hash index built within
> an hour of starting, but got slower and slower after that.
> 
> In short, lack of support for concurrent operations is hardly the
> worst problem with Postgres' hash indexes.  If you wanna fix 'em,
> be my guest ... but I think I shall spend my time elsewhere.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

Neil Conway

Date:

25 April 2002, 17:04:47

On Thu, 25 Apr 2002 16:38:00 -0400 (EDT)
"Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> 
> Nice report.  I think we should start thinking of hiding the hash option
> from users, or warn them more forcefully, rather than hold it out as a
> possible option for them.

Why not do something Peter E. suggested earlier: if the functionality of
hash indexes is a subset of that offered by btrees, it might be good to
remove the hash index code and treat USING 'hash' as an alias for
USING 'btree'?

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

25 April 2002, 17:15:32

Michael Loftis <mloftis@wgops.com> writes:
>  I don't know about PGs implementation but since I assume oyu all 
> inhereted atleast part of it from the berkely boys you should be in very 
> solid form.

One would have thought so, wouldn't one?  AFAIK the hash index code is
lock-stock-and-barrel straight from Berkeley; we've not touched it
except for minor tweaking (portability issues and such).

I spent a little time reading the code whilst I was waiting for the hash
index build to complete, and was kind of wondering why it bothers to
maintain bitmaps of free space.  Seems like it could just keep all the
free pages chained together in a list, for zero overhead cost, and skip
the bitmaps.  It locks the metapage anyway when allocating or freeing
a page, so keeping the freelist head pointer there doesn't seem like it
would have any performance penalty...

<<whacks self on head>> NO <<whack>>  I am not getting involved with the
hash index code.  I don't think it's worth our trouble.
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Michael Loftis

Date:

25 April 2002, 17:34:04

The idea behind hte bitmap (correct me if I'm wrong) is that when larger 
allocationsa re asked for they can be quickly found and there is no need 
to maintain the coalescing of smaller adjacent blocks into larger ones.

I don't know if pg does this or not, but thats the only sane reason I 
can come up with.

*quietly installs an rm -rf trigger if tom does any I/O on the has files 
outside of the compiler* This is for your own safety Tom...  Well that 
and our amusement.... :)

Tom Lane wrote:

>Michael Loftis <mloftis@wgops.com> writes:
>
>> I don't know about PGs implementation but since I assume oyu all 
>>inhereted atleast part of it from the berkely boys you should be in very 
>>solid form.
>>
>
>One would have thought so, wouldn't one?  AFAIK the hash index code is
>lock-stock-and-barrel straight from Berkeley; we've not touched it
>except for minor tweaking (portability issues and such).
>
>I spent a little time reading the code whilst I was waiting for the hash
>index build to complete, and was kind of wondering why it bothers to
>maintain bitmaps of free space.  Seems like it could just keep all the
>free pages chained together in a list, for zero overhead cost, and skip
>the bitmaps.  It locks the metapage anyway when allocating or freeing
>a page, so keeping the freelist head pointer there doesn't seem like it
>would have any performance penalty...
>
><<whacks self on head>> NO <<whack>>  I am not getting involved with the
>hash index code.  I don't think it's worth our trouble.
>
>            regards, tom lane
>

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

25 April 2002, 22:25:19

Neil Conway wrote:
> On Thu, 25 Apr 2002 16:38:00 -0400 (EDT)
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> > 
> > Nice report.  I think we should start thinking of hiding the hash option
> > from users, or warn them more forcefully, rather than hold it out as a
> > possible option for them.
> 
> Why not do something Peter E. suggested earlier: if the functionality of
> hash indexes is a subset of that offered by btrees, it might be good to
> remove the hash index code and treat USING 'hash' as an alias for
> USING 'btree'?

I hate to do that because it makes people think something special is
happening for hash, but it isn't.  We could throw an elog(NOTICE)
stating that hash is not recommended and btree is faster, or something
like that.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

25 April 2002, 22:33:05

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I hate to do that because it makes people think something special is
> happening for hash, but it isn't.  We could throw an elog(NOTICE)
> stating that hash is not recommended and btree is faster, or something
> like that.

I think the only action called for is some improvement in the
documentation.  Right now the docs are not honest about the state
of any of the non-btree index methods.  Ain't none of 'em ready
for prime time IMHO.  GIST is the only one that's getting any
development attention --- and probably the only one that deserves
it, given limited resources.  Hash offers no compelling advantage
over btree AFAICS, and rtree is likewise dominated by GIST (or would
be, if we shipped rtree-equivalent GIST opclasses in the standard
distribution).

I do not like "throw an elog" as a substitute for documentation.
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

20 June 2002, 23:45:25

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I hate to do that because it makes people think something special is
> > happening for hash, but it isn't.  We could throw an elog(NOTICE)
> > stating that hash is not recommended and btree is faster, or something
> > like that.
>
> I think the only action called for is some improvement in the
> documentation.  Right now the docs are not honest about the state
> of any of the non-btree index methods.  Ain't none of 'em ready
> for prime time IMHO.  GIST is the only one that's getting any
> development attention --- and probably the only one that deserves
> it, given limited resources.  Hash offers no compelling advantage
> over btree AFAICS, and rtree is likewise dominated by GIST (or would
> be, if we shipped rtree-equivalent GIST opclasses in the standard
> distribution).
>
> I do not like "throw an elog" as a substitute for documentation.

OK, documentation changes for hash attached.  Do we need to also throw
a elog(WARNING) about its use?  I don't think everyone is going to see
these documentation changes, and I hate to add it to the FAQ.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: doc/src/sgml/indices.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.31
diff -c -r1.31 indices.sgml
*** doc/src/sgml/indices.sgml    7 Jan 2002 02:29:12 -0000    1.31
--- doc/src/sgml/indices.sgml    21 Jun 2002 03:13:47 -0000
***************
*** 181,192 ****
  </synopsis>
     <note>
      <para>
!      Because of the limited utility of hash indexes, a B-tree index
!      should generally be preferred over a hash index.  We do not have
!      sufficient evidence that hash indexes are actually faster than
!      B-trees even for <literal>=</literal> comparisons.  Moreover,
!      hash indexes require coarser locks; see <xref
!      linkend="locking-indexes">.
      </para>
     </note>
    </para>
--- 181,189 ----
  </synopsis>
     <note>
      <para>
!      Testing has shown that hash indexes are slower than btree indexes,
!      and the size and build time for hash indexes is much worse. For
!      these reasons, hash index use is discouraged.
      </para>
     </note>
    </para>
Index: doc/src/sgml/xindex.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v
retrieving revision 1.25
diff -c -r1.25 xindex.sgml
*** doc/src/sgml/xindex.sgml    29 May 2002 17:36:40 -0000    1.25
--- doc/src/sgml/xindex.sgml    21 Jun 2002 03:13:48 -0000
***************
*** 11,19 ****

    <para>
     The procedures described thus far let you define new types, new
!    functions, and new operators.  However, we cannot yet define a secondary
!    index (such as a B-tree, R-tree, or
!    hash access method) over a new type or its operators.
    </para>

    <para>
--- 11,19 ----

    <para>
     The procedures described thus far let you define new types, new
!    functions, and new operators. However, we cannot yet define a
!    secondary index (such as a B-tree, R-tree, or hash access method)
!    over a new type or its operators.
    </para>

    <para>
Index: doc/src/sgml/ref/create_index.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.31
diff -c -r1.31 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml    18 May 2002 15:44:47 -0000    1.31
--- doc/src/sgml/ref/create_index.sgml    21 Jun 2002 03:13:48 -0000
***************
*** 329,334 ****
--- 329,339 ----
      an indexed attribute is involved in a comparison using
      the <literal>=</literal> operator.
     </para>
+    <para>
+      Testing has shown that hash indexes are slower than btree indexes,
+      and the size and build time for hash indexes is much worse. For
+      these reasons, hash index use is discouraged.
+    </para>

     <para>
      Currently, only the B-tree and gist access methods support multicolumn

Re: What is wrong with hashed index usage?

From

Oleg Bartunov

Date:

21 June 2002, 01:46:49

We have documented current GiST interface but in russian.
http://www.sai.msu.su/~megera/postgres/gist/doc/gist-inteface-r.shtml
We have no time to translate it to english :-)
I'd appreciate if somebody could help us in documentation -
Oleg
On Thu, 20 Jun 2002, Bruce Momjian wrote:

> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I hate to do that because it makes people think something special is
> > > happening for hash, but it isn't.  We could throw an elog(NOTICE)
> > > stating that hash is not recommended and btree is faster, or something
> > > like that.
> >
> > I think the only action called for is some improvement in the
> > documentation.  Right now the docs are not honest about the state
> > of any of the non-btree index methods.  Ain't none of 'em ready
> > for prime time IMHO.  GIST is the only one that's getting any
> > development attention --- and probably the only one that deserves
> > it, given limited resources.  Hash offers no compelling advantage
> > over btree AFAICS, and rtree is likewise dominated by GIST (or would
> > be, if we shipped rtree-equivalent GIST opclasses in the standard
> > distribution).
> >
> > I do not like "throw an elog" as a substitute for documentation.
>
> OK, documentation changes for hash attached.  Do we need to also throw
> a elog(WARNING) about its use?  I don't think everyone is going to see
> these documentation changes, and I hate to add it to the FAQ.
>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

21 June 2002, 09:29:51

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>       <para>
> !      Because of the limited utility of hash indexes, a B-tree index
> !      should generally be preferred over a hash index.  We do not have
> !      sufficient evidence that hash indexes are actually faster than
> !      B-trees even for <literal>=</literal> comparisons.  Moreover,
> !      hash indexes require coarser locks; see <xref
> !      linkend="locking-indexes">.
>       </para>
>      </note>  
>     </para>
> --- 181,189 ----
>   </synopsis>
>      <note>
>       <para>
> !      Testing has shown that hash indexes are slower than btree indexes,
> !      and the size and build time for hash indexes is much worse. For
> !      these reasons, hash index use is discouraged.

This change strikes me as a step backwards.  The existing wording tells
the truth; the proposed revision removes the facts in favor of a blanket
assertion that is demonstrably false.
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 09:33:54

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >       <para>
> > !      Because of the limited utility of hash indexes, a B-tree index
> > !      should generally be preferred over a hash index.  We do not have
> > !      sufficient evidence that hash indexes are actually faster than
> > !      B-trees even for <literal>=</literal> comparisons.  Moreover,
> > !      hash indexes require coarser locks; see <xref
> > !      linkend="locking-indexes">.
> >       </para>
> >      </note>  
> >     </para>
> > --- 181,189 ----
> >   </synopsis>
> >      <note>
> >       <para>
> > !      Testing has shown that hash indexes are slower than btree indexes,
> > !      and the size and build time for hash indexes is much worse. For
> > !      these reasons, hash index use is discouraged.
> 
> This change strikes me as a step backwards.  The existing wording tells
> the truth; the proposed revision removes the facts in favor of a blanket
> assertion that is demonstrably false.

OK, which part of is "demonstrably false"?  I think the old "should
generally be preferred" is too vague.  No one has come up with a case
where hash has shown to be faster, and a lot of cases where it is slower.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

21 June 2002, 11:01:13

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> OK, which part of is "demonstrably false"?  I think the old "should
> generally be preferred" is too vague.  No one has come up with a case
> where hash has shown to be faster, and a lot of cases where it is slower.

The only thing I recall being lots worse is initial index build.

I have not tested it much, but I would expect that hash holds up better
in the presence of many equal keys than btree does...
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 11:18:11

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > OK, which part of is "demonstrably false"?  I think the old "should
> > generally be preferred" is too vague.  No one has come up with a case
> > where hash has shown to be faster, and a lot of cases where it is slower.
>
> The only thing I recall being lots worse is initial index build.
>
> I have not tested it much, but I would expect that hash holds up better
> in the presence of many equal keys than btree does...

I remember three problems:  build time, index size, and concurrency
problems.  I was wondering about the equal key case myself, and assumed
hash may be a win there, but with the concurrency problems, is that even
possible?

OK, I have reworded it.  Is that better?  How about an elog(NOTICE) for
hash use?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: doc/src/sgml/indices.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.32
diff -c -r1.32 indices.sgml
*** doc/src/sgml/indices.sgml    21 Jun 2002 03:25:53 -0000    1.32
--- doc/src/sgml/indices.sgml    21 Jun 2002 15:00:32 -0000
***************
*** 181,188 ****
  </synopsis>
     <note>
      <para>
!      Testing has shown that hash indexes are slower than btree indexes,
!      and the size and build time for hash indexes is much worse. For
       these reasons, hash index use is discouraged.
      </para>
     </note>
--- 181,188 ----
  </synopsis>
     <note>
      <para>
!      Testing has shown hash indexes to be similar or slower than btree indexes,
!      and the index size and build time for hash indexes is much worse. For
       these reasons, hash index use is discouraged.
      </para>
     </note>
Index: doc/src/sgml/ref/create_index.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.32
diff -c -r1.32 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml    21 Jun 2002 03:25:53 -0000    1.32
--- doc/src/sgml/ref/create_index.sgml    21 Jun 2002 15:00:32 -0000
***************
*** 330,337 ****
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown that hash indexes are slower than btree indexes,
!      and the size and build time for hash indexes is much worse. For
       these reasons, hash index use is discouraged.
     </para>

--- 330,337 ----
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown hash indexes to be similar or slower than btree indexes,
!      and the index size and build time for hash indexes is much worse. For
       these reasons, hash index use is discouraged.
     </para>

Re: What is wrong with hashed index usage?

From

Tom Lane

Date:

21 June 2002, 11:52:34

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I remember three problems:  build time, index size, and concurrency
> problems.  I was wondering about the equal key case myself, and assumed
> hash may be a win there, but with the concurrency problems, is that even
> possible?

Sure.  Many-equal-keys are a problem for btree whether you have any
concurrency or not.

> OK, I have reworded it.  Is that better?

It's better, but you've still discarded the original's explicit mention
of concurrency problems.  Why do you want to remove information?

> How about an elog(NOTICE) for hash use?

I don't think that's appropriate.
        regards, tom lane

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 13:41:57

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I remember three problems:  build time, index size, and concurrency
> > problems.  I was wondering about the equal key case myself, and assumed
> > hash may be a win there, but with the concurrency problems, is that even
> > possible?
>
> Sure.  Many-equal-keys are a problem for btree whether you have any
> concurrency or not.
>
> > OK, I have reworded it.  Is that better?
>
> It's better, but you've still discarded the original's explicit mention
> of concurrency problems.  Why do you want to remove information?

OK, concurrency added.  How is that?

>
> > How about an elog(NOTICE) for hash use?
>
> I don't think that's appropriate.

I was thinking of this during CREATE INDEX ... hash:

    NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
    reference page for more information.

Does anyone else like/dislike that?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: doc/src/sgml/indices.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.32
diff -c -r1.32 indices.sgml
*** doc/src/sgml/indices.sgml    21 Jun 2002 03:25:53 -0000    1.32
--- doc/src/sgml/indices.sgml    21 Jun 2002 16:50:23 -0000
***************
*** 181,189 ****
  </synopsis>
     <note>
      <para>
!      Testing has shown that hash indexes are slower than btree indexes,
!      and the size and build time for hash indexes is much worse. For
!      these reasons, hash index use is discouraged.
      </para>
     </note>
    </para>
--- 181,190 ----
  </synopsis>
     <note>
      <para>
!      Testing has shown hash indexes to be similar or slower than btree
!      indexes, and the index size and build time for hash indexes is much
!      worse. Hash indexes also suffer poor performance under high
!      concurrency. For these reasons, hash index use is discouraged.
      </para>
     </note>
    </para>
Index: doc/src/sgml/ref/create_index.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.32
diff -c -r1.32 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml    21 Jun 2002 03:25:53 -0000    1.32
--- doc/src/sgml/ref/create_index.sgml    21 Jun 2002 16:50:23 -0000
***************
*** 330,338 ****
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown that hash indexes are slower than btree indexes,
!      and the size and build time for hash indexes is much worse. For
!      these reasons, hash index use is discouraged.
     </para>

     <para>
--- 330,339 ----
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown hash indexes to be similar or slower than btree
!      indexes, and the index size and build time for hash indexes is much
!      worse. Hash indexes also suffer poor performance under high
!      concurrency. For these reasons, hash index use is discouraged.
     </para>

     <para>

Re: What is wrong with hashed index usage?

From

"Dann Corbit"

Date:

21 June 2002, 16:28:56

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, June 21, 2002 6:32 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis@wgops.com; Dann Corbit;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > >       <para>
> > > !      Because of the limited utility of hash indexes, a
> B-tree index
> > > !      should generally be preferred over a hash index.
> We do not have
> > > !      sufficient evidence that hash indexes are actually
> faster than
> > > !      B-trees even for <literal>=</literal> comparisons.
>  Moreover,
> > > !      hash indexes require coarser locks; see <xref
> > > !      linkend="locking-indexes">.
> > >       </para>
> > >      </note>
> > >     </para>
> > > --- 181,189 ----
> > >   </synopsis>
> > >      <note>
> > >       <para>
> > > !      Testing has shown that hash indexes are slower
> than btree indexes,
> > > !      and the size and build time for hash indexes is
> much worse. For
> > > !      these reasons, hash index use is discouraged.
> >
> > This change strikes me as a step backwards.  The existing
> wording tells
> > the truth; the proposed revision removes the facts in favor
> of a blanket
> > assertion that is demonstrably false.
>
> OK, which part of is "demonstrably false"?  I think the old "should
> generally be preferred" is too vague.  No one has come up with a case
> where hash has shown to be faster, and a lot of cases where
> it is slower.

I agree with Tom.  Maybe it is not true for PostgreSQL that hashed
indexes are better, but for every other database if you are doing single
lookups and do not need to order the items sequentially, hashed indexes
are better.  What this indicates to me is that hashed indexes could
{potentially} be much better implemented for PostgreSQL.

See section 2.4:
http://citeseer.nj.nec.com/cache/papers/cs/21214/http:zSzzSzwww.cs.cmu.e
duzSz~christoszSzcourseszSz826-resourceszSzFOILS-LATEXzSzslides.pdf/inde
xing-multimedia-databases.pdf

See
http://ycmi.med.yale.edu/nadkarni/db_course/NonStd_Contents.htm

See also:
http://www-courses.cs.uiuc.edu/~cs411/RR2_goodpoints.html

From the Oracle Rdb documentation:
1.3.5 Retrieval Methods
Oracle Rdb provides several methods for retrieving or accessing data. In
the physical design of your database, consider that Oracle Rdb can use
one or more of the following methods to retrieve the rows in a table:

Sequential: locating a row or rows in sequence by retrieving data within
a logical area
Sorted index lookup with value retrieval: using the database key (dbkey)
for the value from the index to retrieve the row
Sorted index only: using data values in the index key pertinent to your
query
Hashed index retrieval: for retrieving exact data value matches
Dbkey only: retrieving a row through its dbkey
You determine the retrieval method Oracle Rdb chooses by creating one or
more sorted or hashed indexes.

Sorted index retrieval provides indexed sequential access to rows in a
table. (A sorted index is also called a B-tree index.) By contrast,
hashed index retrieval, also known as hash-addressing, provides direct
retrieval of a specific row. Retrieval of a row is based on a given
value of some set of columns in the row (called the search key).

Use a hashed index primarily for random, direct retrieval when you can
supply the entire hashed key on which the hashed index is defined, such
as an employee identification number (ID). For this kind of retrieval,
input/output operations can be significantly reduced, particularly for
tables with many rows and large indexes.

For example, to retrieve a row using a sorted index that is four levels
deep, Oracle Rdb may need to do a total of five input/output operations,
one for each level of the sorted index and one to retrieve the actual
row. By using a hashed index, the number of input/output operations may
be reduced to one or two because hashed index retrieval retrieves the
row directly.

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 17:12:51

Dann Corbit wrote:
> > > This change strikes me as a step backwards.  The existing
> > wording tells
> > > the truth; the proposed revision removes the facts in favor
> > of a blanket
> > > assertion that is demonstrably false.
> >
> > OK, which part of is "demonstrably false"?  I think the old "should
> > generally be preferred" is too vague.  No one has come up with a case
> > where hash has shown to be faster, and a lot of cases where
> > it is slower.
>
> I agree with Tom.  Maybe it is not true for PostgreSQL that hashed
> indexes are better, but for every other database if you are doing single
> lookups and do not need to order the items sequentially, hashed indexes
> are better.  What this indicates to me is that hashed indexes could
> {potentially} be much better implemented for PostgreSQL.

Yes, our implementation needs help.  People who know other db's are
probably choosing hash thinking it is as good as btree in our code, and
it isn't.  That's why I wanted the documentation update, and why I am
suggesting the elog(NOTICE).

I have updated the documentation to specifically mention that
PostgreSQL's hashes are slower/similar to btree.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: doc/src/sgml/diskusage.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/diskusage.sgml,v
retrieving revision 1.1
diff -c -r1.1 diskusage.sgml
*** doc/src/sgml/diskusage.sgml    13 Jun 2002 05:15:22 -0000    1.1
--- doc/src/sgml/diskusage.sgml    21 Jun 2002 19:06:03 -0000
***************
*** 22,31 ****
    </para>

    <para>
!    You can monitor disk space from two places; from inside
!    <application>psql</> and from the command line using
!    <application>contrib/oid2name</>. Using <application>psql</> you can
!    issue queries to see the disk usage for any table:
  <programlisting>
  play=# SELECT relfilenode, relpages
  play-# FROM pg_class
--- 22,33 ----
    </para>

    <para>
!    You can monitor disk space from three places: from
!    <application>psql</> using <command>VACUUM</> information, from
!    <application>psql</> using <application>contrib/dbsize</>, and from
!    the command line using <application>contrib/oid2name</>. Using
!    <application>psql</> on a recently vacuumed (or analyzed) database,
!    you can issue queries to see the disk usage of any table:
  <programlisting>
  play=# SELECT relfilenode, relpages
  play-# FROM pg_class
***************
*** 38,47 ****
    </para>

    <para>
!    Each page is typically 8 kilobytes. <literal>relpages</> is only
!    updated by <command>VACUUM</> and <command>ANALYZE</>. To show the
!    space used by <acronym>TOAST</> tables, use a query based on the heap
!    relfilenode:
  <programlisting>
  play=# SELECT relname, relpages
  play-# FROM pg_class
--- 40,49 ----
    </para>

    <para>
!    Each page is typically 8 kilobytes. (Remember, <literal>relpages</>
!    is only updated by <command>VACUUM</> and <command>ANALYZE</>.) To
!    show the space used by <acronym>TOAST</> tables, use a query based on
!    the heap relfilenode shown above:
  <programlisting>
  play=# SELECT relname, relpages
  play-# FROM pg_class
Index: doc/src/sgml/indices.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.33
diff -c -r1.33 indices.sgml
*** doc/src/sgml/indices.sgml    21 Jun 2002 16:52:00 -0000    1.33
--- doc/src/sgml/indices.sgml    21 Jun 2002 19:06:04 -0000
***************
*** 181,190 ****
  </synopsis>
     <note>
      <para>
!      Testing has shown hash indexes to be similar or slower than btree
!      indexes, and the index size and build time for hash indexes is much
!      worse. Hash indexes also suffer poor performance under high
!      concurrency. For these reasons, hash index use is discouraged.
      </para>
     </note>
    </para>
--- 181,191 ----
  </synopsis>
     <note>
      <para>
!      Testing has shown PostgreSQL's hash indexes to be similar or slower
!      than btree indexes, and the index size and build time for hash
!      indexes is much worse. Hash indexes also suffer poor performance
!      under high concurrency. For these reasons, hash index use is
!      discouraged.
      </para>
     </note>
    </para>
Index: doc/src/sgml/ref/create_index.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.33
diff -c -r1.33 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml    21 Jun 2002 16:52:00 -0000    1.33
--- doc/src/sgml/ref/create_index.sgml    21 Jun 2002 19:06:05 -0000
***************
*** 330,339 ****
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown hash indexes to be similar or slower than btree
!      indexes, and the index size and build time for hash indexes is much
!      worse. Hash indexes also suffer poor performance under high
!      concurrency. For these reasons, hash index use is discouraged.
     </para>

     <para>
--- 330,340 ----
      the <literal>=</literal> operator.
     </para>
     <para>
!      Testing has shown PostgreSQL's hash indexes to be similar or slower
!      than btree indexes, and the index size and build time for hash
!      indexes is much worse. Hash indexes also suffer poor performance
!      under high concurrency. For these reasons, hash index use is
!      discouraged.
     </para>

     <para>

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 17:36:12

Dann Corbit wrote:
> > I was thinking of this during CREATE INDEX ... hash:
> > 
> >     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
> >     reference page for more information.
> > 
> > Does anyone else like/dislike that?
> 
> I think it might be OK temporarily, to show that there is some work that
> needs done.  When hashed indexes are fixed, the notice should be
> removed.

Oh, yes, clearly, we would remove it once we had a hash implementation
that had _any_ advantages over btree.

So, is you vote for or against the elog(NOTICE)?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

"Dann Corbit"

Date:

21 June 2002, 17:37:21

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, June 21, 2002 9:52 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis@wgops.com; Dann Corbit;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I remember three problems:  build time, index size, and
> concurrency
> > > problems.  I was wondering about the equal key case
> myself, and assumed
> > > hash may be a win there, but with the concurrency
> problems, is that even
> > > possible?
> >
> > Sure.  Many-equal-keys are a problem for btree whether you have any
> > concurrency or not.
> >
> > > OK, I have reworded it.  Is that better?
> >
> > It's better, but you've still discarded the original's
> explicit mention
> > of concurrency problems.  Why do you want to remove information?
>
> OK, concurrency added.  How is that?
>
> >
> > > How about an elog(NOTICE) for hash use?
> >
> > I don't think that's appropriate.
>
> I was thinking of this during CREATE INDEX ... hash:
>
>     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
>     reference page for more information.
>
> Does anyone else like/dislike that?

I think it might be OK temporarily, to show that there is some work that
needs done.  When hashed indexes are fixed, the notice should be
removed.

I have not looked at the hash code.  Here is a strategy (off the top of
my head) that seems that it should work:

Use Bob Jenkins' 64 bit generic hash from here (totally free for use and
fast as blazes):
http://burtleburtle.net/bob/hash/index.html

Specifically:
http://burtleburtle.net/bob/c/lookup8.c and routine: hash( k, length,
level)

Now, with a 64 bit hash, there is very tiny probability of a collision
(but you could have duplicate data).
The hash index would consist of nothing more than this:
[long long hash=64 bit hash code][unsigned nmatches=count of matching
hashes][array of {nmatches} pointers directly to the records with that
hash]

This is probably grotesqely oversimplified.  But maybe it will spur an
indea in the person who writes the indexing code.

Re: What is wrong with hashed index usage?

From

Larry Rosenman

Date:

21 June 2002, 17:45:38

On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:

> 
> > 
> > > How about an elog(NOTICE) for hash use?
> > 
> > I don't think that's appropriate.
> 
> I was thinking of this during CREATE INDEX ... hash:
> 
>     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
>     reference page for more information.
> 
> Does anyone else like/dislike that?
I dislike it.  Some clients/dba's will wonder why we even have them. 

Why should we bug the DBA on EVERY index that is a hash? 

I know I personally hate the FreeBSD linker warnings about certain
functions, and don't like that precedent. 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: What is wrong with hashed index usage?

From

Larry Rosenman

Date:

21 June 2002, 17:53:51

On Fri, 2002-06-21 at 15:12, Bruce Momjian wrote:
> Larry Rosenman wrote:
> > On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > 
> > > 
> > > > 
> > > > > How about an elog(NOTICE) for hash use?
> > > > 
> > > > I don't think that's appropriate.
> > > 
> > > I was thinking of this during CREATE INDEX ... hash:
> > > 
> > >     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
> > >     reference page for more information.
> > > 
> > > Does anyone else like/dislike that?
> > I dislike it.  Some clients/dba's will wonder why we even have them. 
> > 
> > Why should we bug the DBA on EVERY index that is a hash? 
> > 
> > I know I personally hate the FreeBSD linker warnings about certain
> > functions, and don't like that precedent. 
> 
> OK, that's enough of a negative vote for me.  So you feel the
> documentation change is enough?  Tom thinks so too.
Yup.

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

21 June 2002, 17:55:17

Larry Rosenman wrote:
> On Fri, 2002-06-21 at 11:51, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > 
> > > 
> > > > How about an elog(NOTICE) for hash use?
> > > 
> > > I don't think that's appropriate.
> > 
> > I was thinking of this during CREATE INDEX ... hash:
> > 
> >     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
> >     reference page for more information.
> > 
> > Does anyone else like/dislike that?
> I dislike it.  Some clients/dba's will wonder why we even have them. 
> 
> Why should we bug the DBA on EVERY index that is a hash? 
> 
> I know I personally hate the FreeBSD linker warnings about certain
> functions, and don't like that precedent. 

OK, that's enough of a negative vote for me.  So you feel the
documentation change is enough?  Tom thinks so too.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What is wrong with hashed index usage?

From

"Dann Corbit"

Date:

21 June 2002, 18:07:16

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, June 21, 2002 1:31 PM
> To: Dann Corbit
> Cc: Tom Lane; Neil Conway; mloftis@wgops.com;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Dann Corbit wrote:
> > > I was thinking of this during CREATE INDEX ... hash:
> > >
> > >     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
> > >     reference page for more information.
> > >
> > > Does anyone else like/dislike that?
> >
> > I think it might be OK temporarily, to show that there is
> some work that
> > needs done.  When hashed indexes are fixed, the notice should be
> > removed.
>
> Oh, yes, clearly, we would remove it once we had a hash implementation
> that had _any_ advantages over btree.
>
> So, is you vote for or against the elog(NOTICE)?

I will defer to the preference of the others.  I lean ever so slightly
towards the notice, because it is very unusual for hashed index not to
be faster for single item lookup.

Re: What is wrong with hashed index usage?

From

Thomas Lockhart

Date:

21 June 2002, 22:03:43

> So, is you vote for or against the elog(NOTICE)?

OK, if we are still voting, then I'll mention that I generally dislike
the idea of notices of this kind. And would not like this notice in
particular. So would vote no with both hands ;)

I'm pretty sure that we have a consensus policy (hmm, at least if a
consensus consists of repeated votes on one question or the other) that
notices to protect people from doing what they ask the system to do are
not generally desirable.

Putting messages in as a spur to development is not particularly
effective; witness a few chapters in the docs which consist of "This
needs to be written. Any volunteers?" and which have stayed untouched
for three years now ;)
                    - Thomas

Re: What is wrong with hashed index usage?

From

David Ford

Date:

21 June 2002, 22:11:20

On the other hand, I like hints on how to do things better ;)

David

Thomas Lockhart wrote:

>>So, is you vote for or against the elog(NOTICE)?
>>    
>>
>
>OK, if we are still voting, then I'll mention that I generally dislike
>the idea of notices of this kind. And would not like this notice in
>particular. So would vote no with both hands ;)
>  
>

Re: What is wrong with hashed index usage?

From

Michael Loftis

Date:

21 June 2002, 22:21:08

Which is whay you RTFM ;)

--On Friday, June 21, 2002 10:10 PM -0400 David Ford 
<david+cert@blue-labs.org> wrote:

> On the other hand, I like hints on how to do things better ;)
>
> David
>
> Thomas Lockhart wrote:
>
>>> So, is you vote for or against the elog(NOTICE)?
>>>
>>>
>>
>> OK, if we are still voting, then I'll mention that I generally dislike
>> the idea of notices of this kind. And would not like this notice in
>> particular. So would vote no with both hands ;)
>>
>>
>

Re: What is wrong with hashed index usage?

From

Bruce Momjian

Date:

22 June 2002, 00:02:22

Thomas Lockhart wrote:
> > So, is you vote for or against the elog(NOTICE)?
> 
> OK, if we are still voting, then I'll mention that I generally dislike
> the idea of notices of this kind. And would not like this notice in
> particular. So would vote no with both hands ;)
> 
> I'm pretty sure that we have a consensus policy (hmm, at least if a
> consensus consists of repeated votes on one question or the other) that
> notices to protect people from doing what they ask the system to do are
> not generally desirable.
> 
> Putting messages in as a spur to development is not particularly
> effective; witness a few chapters in the docs which consist of "This
> needs to be written. Any volunteers?" and which have stayed untouched
> for three years now ;)

OK, elog(NOTICE) is voted down.  SGML docs are updated.  We don't need
an FAQ item for this, do we?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026