Thread: Nasty problem in hash indexes

Nasty problem in hash indexes

From
Tom Lane
Date:
I've traced through the failure reported here by Markus Kr�utner:
http://archives.postgresql.org/pgsql-hackers/2003-08/msg01132.php

What is happening is that as the UPDATE adds tuples (all with the same
hash key value) to the table, the hash bucket being filled eventually
requires more pages, and this results in a _hash_splitpage() operation
(which is misnamed, it should be _hash_splitbucket).  By chance, the
bucket that is selected to be split is the one containing the older key
values, all of which get relocated to the new bucket.  So when control
returns to the indexscan that is sourcing the tuples for the UPDATE,
there are no tuples remaining in the bucket it is looking at, and it
exits thinking it's done.

I'm not sure how many variants on this problem there might be, but
clearly the fundamental bug is that a hash bucket split takes no account
of preserving the state of concurrent index scans.

This is likely to be messy to fix :-(.  A brute-force solution may be
possible by generalizing hash_adjscans so that it can update indexscans
of our own backend for bucket-split operations; we'd have to rely on
page locking to prevent problems against scans of other backends.  The
locking aspect is particularly unattractive because of the possibility
of deadlocks.  If a bucket split fails because of deadlock, we're
probably left with a corrupt hash index.

Does anyone see a better way?

Does anyone want to vote to jettison the hash index code entirely?
Personally I'm not eager to put a lot of work into fixing it.
        regards, tom lane


Re: Nasty problem in hash indexes

From
"scott.marlowe"
Date:
On Thu, 28 Aug 2003, Tom Lane wrote:

> I've traced through the failure reported here by Markus Kräutner:
> http://archives.postgresql.org/pgsql-hackers/2003-08/msg01132.php
> 
> What is happening is that as the UPDATE adds tuples (all with the same
> hash key value) to the table, the hash bucket being filled eventually
> requires more pages, and this results in a _hash_splitpage() operation
> (which is misnamed, it should be _hash_splitbucket).  By chance, the
> bucket that is selected to be split is the one containing the older key
> values, all of which get relocated to the new bucket.  So when control
> returns to the indexscan that is sourcing the tuples for the UPDATE,
> there are no tuples remaining in the bucket it is looking at, and it
> exits thinking it's done.
> 
> I'm not sure how many variants on this problem there might be, but
> clearly the fundamental bug is that a hash bucket split takes no account
> of preserving the state of concurrent index scans.
> 
> This is likely to be messy to fix :-(.  A brute-force solution may be
> possible by generalizing hash_adjscans so that it can update indexscans
> of our own backend for bucket-split operations; we'd have to rely on
> page locking to prevent problems against scans of other backends.  The
> locking aspect is particularly unattractive because of the possibility
> of deadlocks.  If a bucket split fails because of deadlock, we're
> probably left with a corrupt hash index.
> 
> Does anyone see a better way?
> 
> Does anyone want to vote to jettison the hash index code entirely?
> Personally I'm not eager to put a lot of work into fixing it.

I've had naught but bad experiences with hash indexes myself.  Maybe toss 
it and see if someone wants to reimplement it some day in the future?

If I'm reading this right, this bug means you could do:

select * from table where field in (1,2,3,4)

where you should get say 100 rows, and you might not get all 100 rows?  If 
so, then how many other bugs are lurking in the hash index code waiting to 
bite?



Re: Nasty problem in hash indexes

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> If I'm reading this right, this bug means you could do:
> select * from table where field in (1,2,3,4)
> where you should get say 100 rows, and you might not get all 100 rows?

Yes, if you were concurrently inserting into the same table.  The given
example involved UPDATEs, not a SELECT.  Probably INSERT/SELECT could
see the same kind of failure.

I'm not sure whether a failure could occur across two backends (one
inserting and one selecting).  The page-level locking might prevent
that.  Or perhaps not.  If it could happen, of course the problem is
vastly more dangerous than if it can't.

> If so, then how many other bugs are lurking in the hash index code
> waiting to bite?

<shrug> Who's to say?  We've found bugs in the btree logic recently,
too.  But I have lots more confidence in btree.
        regards, tom lane


Re: Nasty problem in hash indexes

From
Neil Conway
Date:
On Thu, Aug 28, 2003 at 05:37:39PM -0400, Tom Lane wrote:
> > If so, then how many other bugs are lurking in the hash index code
> > waiting to bite?
> 
> <shrug> Who's to say?  We've found bugs in the btree logic recently,
> too.

I'd rather print a loud warning when a hash index is created, but keep
the code in the tree, than just remove it entirely. That way, we'll
avoid unnecessary bit-rot to some degree, and if someone feels that
they absolutely positively need hash indexes, they will have some
existing work to begin from.

-Neil



Re: Nasty problem in hash indexes

From
"scott.marlowe"
Date:
On Thu, 28 Aug 2003, Neil Conway wrote:

> On Thu, Aug 28, 2003 at 05:37:39PM -0400, Tom Lane wrote:
> > > If so, then how many other bugs are lurking in the hash index code
> > > waiting to bite?
> > 
> > <shrug> Who's to say?  We've found bugs in the btree logic recently,
> > too.
> 
> I'd rather print a loud warning when a hash index is created, but keep
> the code in the tree, than just remove it entirely. That way, we'll
> avoid unnecessary bit-rot to some degree, and if someone feels that
> they absolutely positively need hash indexes, they will have some
> existing work to begin from.

Sorry, but if hash indexes really do present a possible race condition 
where you could get a short read WITH NO ERROR, then they should at least 
be commented out and if you create one USING HASH we should print a notice 
that we're actually creating a btree for you and hash has been deprecated 
at this point.

I can see leaving the code in as something to work on, but you shouldn't 
have to worry about whether or not your database is gonna have a short 
read without an error.

Postgresql's philosophy has always seemed to be correctness first, 
convenience and performance second.  I like that philosophy, compared to 
many other databases out there. 




Re: Nasty problem in hash indexes

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Thu, 28 Aug 2003, Neil Conway wrote:
>> On Thu, Aug 28, 2003 at 05:37:39PM -0400, Tom Lane wrote:
>>> <shrug> Who's to say?  We've found bugs in the btree logic recently,
>>> too.
>> 
>> I'd rather print a loud warning when a hash index is created, but keep
>> the code in the tree, than just remove it entirely.

> Postgresql's philosophy has always seemed to be correctness first, 
> convenience and performance second.

I agree --- we either fix this bug or remove hash indexes.  There's no
third choice.  However, I don't agree with killing hash indexes just
because there *might* be more bugs in them.  If we have an impractical-
to-fix bug in front of us, then it's time for harsh measures, but
otherwise ...
        regards, tom lane


Re: Nasty problem in hash indexes

From
"scott.marlowe"
Date:
On Fri, 29 Aug 2003, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > On Thu, 28 Aug 2003, Neil Conway wrote:
> >> On Thu, Aug 28, 2003 at 05:37:39PM -0400, Tom Lane wrote:
> >>> <shrug> Who's to say?  We've found bugs in the btree logic recently,
> >>> too.
> >> 
> >> I'd rather print a loud warning when a hash index is created, but keep
> >> the code in the tree, than just remove it entirely.
> 
> > Postgresql's philosophy has always seemed to be correctness first, 
> > convenience and performance second.
> 
> I agree --- we either fix this bug or remove hash indexes.  There's no
> third choice.  However, I don't agree with killing hash indexes just
> because there *might* be more bugs in them.  If we have an impractical-
> to-fix bug in front of us, then it's time for harsh measures, but
> otherwise ...

Sorry if I gave the impression earlier that we should get rid of hash 
indexes because there might be more bugs.  I didn't really mean it that 
way.  I just meant that if this one was going to be a hard fix, then that 
might be one of the mitigating factors for how much work someone's going 
to be willing to put into this.  

If it's an easy fix then it's likely worth the effort to keep the hash 
indexes around.