Thread: USING HASH considered harmful?

USING HASH considered harmful?

From
Stephen Robert Norris
Date:
We've just discovered a rather nasty feature of hashes, namely that
simultaneous reads & writes to a single row will deadlock if there
is a hash index on the table.

I guess this is because PG really has to lock the hash table entry in
both cases. It does, however, make HASH indices completely useless for
any table that you might want to update.

Is this a known feature?

    Stephen

Attachment

Re: USING HASH considered harmful?

From
Tom Lane
Date:
Stephen Robert Norris <srn@commsecure.com.au> writes:
> We've just discovered a rather nasty feature of hashes, namely that
> simultaneous reads & writes to a single row will deadlock if there
> is a hash index on the table.

The hash index code is documented to be subject to deadlocks, but I've
never looked into it to discover the exact conditions that cause
problems.

I could maybe get excited about fixing this if I could think of one
single application wherein a hash index is superior to a btree index.
But I can't, so I think the hash index code should be deprecated and
left to die quietly.

(This para strictly MHO:) In the long run the btree and GIST index types
will probably be the only two of the four present types that remain
supported.  Since btree dominates hash and GIST dominates rtree for
functionality, it's hard to see why we should expend development effort
on the other two.  Right now, GIST isn't really ready for prime time
either (no support for concurrent updates), but it does things that
btree can't do, so there's reason to expend work on it.

As of today, if you're concerned about concurrent updates, btree is the
only Postgres index type you should be using.

            regards, tom lane

Re: USING HASH considered harmful?

From
Bruce Momjian
Date:
Checking application/pgp-signature: FAILURE
-- Start of PGP signed section.
> We've just discovered a rather nasty feature of hashes, namely that
> simultaneous reads & writes to a single row will deadlock if there
> is a hash index on the table.
>
> I guess this is because PG really has to lock the hash table entry in
> both cases. It does, however, make HASH indices completely useless for
> any table that you might want to update.
>
> Is this a known feature?

Yes, I have heard about this problem.  Would you test btree vs hash and
report back which is faster.  I have requested this from >20 people and
no one reported back.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: USING HASH considered harmful?

From
Stephen Robert Norris
Date:
On Thu, Aug 16, 2001 at 09:59:02PM -0400, Bruce Momjian wrote:
>
> Checking application/pgp-signature: FAILURE
> -- Start of PGP signed section.
> > We've just discovered a rather nasty feature of hashes, namely that
> > simultaneous reads & writes to a single row will deadlock if there
> > is a hash index on the table.
> >
> > I guess this is because PG really has to lock the hash table entry in
> > both cases. It does, however, make HASH indices completely useless for
> > any table that you might want to update.
> >
> > Is this a known feature?
>
> Yes, I have heard about this problem.  Would you test btree vs hash and
> report back which is faster.  I have requested this from >20 people and
> no one reported back.

I changed to hash because it was marginally faster for the queries
we were doing - maybe 5% or so. It was very marginal, but we
need every bit of performance we can get.

    Stephen

Re: USING HASH considered harmful?

From
Bruce Momjian
Date:
> > > I guess this is because PG really has to lock the hash table entry in
> > > both cases. It does, however, make HASH indices completely useless for
> > > any table that you might want to update.
> > >
> > > Is this a known feature?
> >
> > Yes, I have heard about this problem.  Would you test btree vs hash and
> > report back which is faster.  I have requested this from >20 people and
> > no one reported back.
>
> I changed to hash because it was marginally faster for the queries
> we were doing - maybe 5% or so. It was very marginal, but we
> need every bit of performance we can get.

Thanks.  Very valuable information.  It tells us that it may be worth
trying to optimize it someday.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026