Thread: Deprecating Hash Indexes

Deprecating Hash Indexes

From
Simon Riggs
Date:
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.

I describe them as broken because they are neither crash safe, nor
could they properly be called unlogged indexes (or if so, why just
hash?). And also because if somebody suggested implementing them the
way they are now, they would be told they were insane because silent
data corruption is not something we tolerate anymore. We know why they
are like that, but its time to remove the rest of the historical
research legacy. It's hard to say "We respect your data [except if you
press here]" and be taken seriously.

Suggested actions are

* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.

* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.

Personally, I would like to see them removed into a contrib module to
allow people to re-add them if they understand the risks. ISTM better
to confiscate all foot-guns before they go off and then re-issue them
to marksmen, at the risk of annoying the people that use them with
full knowledge but that's likely a contentious issue.

Alternate views?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Andrew Dunstan
Date:
On 10/14/2012 09:45 AM, Simon Riggs wrote:
> As discussed on other threads, Hash Indexes are currently a broken
> feature and bring us into disrepute.
>
> I describe them as broken because they are neither crash safe, nor
> could they properly be called unlogged indexes (or if so, why just
> hash?). And also because if somebody suggested implementing them the
> way they are now, they would be told they were insane because silent
> data corruption is not something we tolerate anymore. We know why they
> are like that, but its time to remove the rest of the historical
> research legacy. It's hard to say "We respect your data [except if you
> press here]" and be taken seriously.
>
> Suggested actions are
>
> * Put WARNINGs in the docs against the use of hash indexes, backpatch
> to 8.3. CREATE INDEX gives no warning currently, though Index Types
> does mention a caution.
>
> * Mention in the current docs that hash indexes are likely to be
> deprecated completely in future releases. Should anybody ever make
> them work, we can change that advice quickly but I don't think we're
> going to.
>
> Personally, I would like to see them removed into a contrib module to
> allow people to re-add them if they understand the risks. ISTM better
> to confiscate all foot-guns before they go off and then re-issue them
> to marksmen, at the risk of annoying the people that use them with
> full knowledge but that's likely a contentious issue.
>
> Alternate views?


I think we'd be better off to start by implementing unlogged indexes 
generally. I have a client who is using hash indexes for the performance 
gain on a very heavy insert load precisely because they are unlogged, 
and who can get away with it because all their index lookup requirements 
are equality. They are aware of the consequences of a crash, and can 
manage the risk accordingly.

But there is a lot of attraction in the idea of unlogged btree indexes 
for example. And the danger of an unlogged index is substantially less 
than that of an unlogged table. After all, your data is still there, 
quite crash-safe, and you can thus just rebuild the index after a crash.

cheers

andrew



Re: Deprecating Hash Indexes

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> As discussed on other threads, Hash Indexes are currently a broken
> feature and bring us into disrepute.

Yeah ...

> Suggested actions are

I find it curious that you don't even bother to list "add WAL support to
hash indexes" as a possible solution.  It's certainly doable, and
probably not even very much more work than something like moving hash
support to a contrib module would be.  (Assuming that's even possible,
which I doubt, because the core code relies on hash index opclasses even
if not on hash indexes.)
        regards, tom lane



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 14 October 2012 17:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> As discussed on other threads, Hash Indexes are currently a broken
>> feature and bring us into disrepute.
>
> Yeah ...
>
>> Suggested actions are
>
> I find it curious that you don't even bother to list "add WAL support to
> hash indexes" as a possible solution.  It's certainly doable, and
> probably not even very much more work than something like moving hash
> support to a contrib module would be.  (Assuming that's even possible,
> which I doubt, because the core code relies on hash index opclasses even
> if not on hash indexes.)

It is doable, yes.

I think that's at least 2-3 months total work, much of it low-level
bug fixing and eventual production bug hunting. Its gone for 9.3
already, so the earliest this would see production code is Sept 2014.

It's a task beyond most people's ability and beyond the range of
professionals without funding. I don't see anyone funding a medium-low
importance feature ahead of the higher ones out there, so I don't
expect the situation to change for (more) years.

If you personally have time to spare on performance features, there
are many optimizer tasks more worthy of your attention and more
popular also, so I don't ask for a code sprint on this.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Robert Haas
Date:
On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> * Put WARNINGs in the docs against the use of hash indexes, backpatch
> to 8.3. CREATE INDEX gives no warning currently, though Index Types
> does mention a caution.

I'd be in favor of adding such warnings to the documentation if they
are not there already, and possibly even warning on CREATE INDEX ..
USING hash().  I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release.  Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Deprecating Hash Indexes

From
Andres Freund
Date:
On Sunday, October 14, 2012 03:45:49 PM Simon Riggs wrote:
> As discussed on other threads, Hash Indexes are currently a broken
> feature and bring us into disrepute.
> 
> I describe them as broken because they are neither crash safe, nor
> could they properly be called unlogged indexes (or if so, why just
> hash?). And also because if somebody suggested implementing them the
> way they are now, they would be told they were insane because silent
> data corruption is not something we tolerate anymore. We know why they
> are like that, but its time to remove the rest of the historical
> research legacy. It's hard to say "We respect your data [except if you
> press here]" and be taken seriously.
> 
> Suggested actions are
> 
> * Put WARNINGs in the docs against the use of hash indexes, backpatch
> to 8.3. CREATE INDEX gives no warning currently, though Index Types
> does mention a caution.
> 
> * Mention in the current docs that hash indexes are likely to be
> deprecated completely in future releases. Should anybody ever make
> them work, we can change that advice quickly but I don't think we're
> going to.
> 
> Personally, I would like to see them removed into a contrib module to
> allow people to re-add them if they understand the risks. ISTM better
> to confiscate all foot-guns before they go off and then re-issue them
> to marksmen, at the risk of annoying the people that use them with
> full knowledge but that's likely a contentious issue.
> 
> Alternate views?

I vote for at least logging a wal record when a hash index is modified which 
uses incomplete actions to set 'indisready = false' in case its replayed. That 
should only use a rather minor amount of code and should help users to find 
problems faster.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 15 October 2012 15:13, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> * Put WARNINGs in the docs against the use of hash indexes, backpatch
>> to 8.3. CREATE INDEX gives no warning currently, though Index Types
>> does mention a caution.
>
> I'd be in favor of adding such warnings to the documentation if they
> are not there already, and possibly even warning on CREATE INDEX ..
> USING hash().

Sounds like a good idea.

> I don't think I'd go so far as to say that we should
> imply that they'll be removed in a future release.  Given how deeply
> intertwined they are with the planner, I doubt that that will happen;
> and I think there is enough interest in the technology that it's
> likely to eventually be fixed.

Hash indexes aren't used in the planner. Hash joins use completely
separate code.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 15 October 2012 15:19, Andres Freund said...

> I vote for at least logging a wal record when a hash index is modified which
> uses incomplete actions to set 'indisready = false' in case its replayed. That
> should only use a rather minor amount of code and should help users to find
> problems faster.

Good idea, though might be harder than it first appears.

How do we issue just one of those per checkpoint, to minimise WAL? How
do we make that change with a physical update WAL? Non-transactional
update? During recovery?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Robert Haas
Date:
On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I don't think I'd go so far as to say that we should
>> imply that they'll be removed in a future release.  Given how deeply
>> intertwined they are with the planner, I doubt that that will happen;
>> and I think there is enough interest in the technology that it's
>> likely to eventually be fixed.
>
> Hash indexes aren't used in the planner. Hash joins use completely
> separate code.

It's not really completely separate, because to do a hash join we have
to find a hash function for the relevant data types, and IIUC we do
that by looking up the default hash opclass for the datatype and
finding its first support function.  Of course, if we were to remove
the hash AM, then you couldn't define a hash opclass against it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Deprecating Hash Indexes

From
Andres Freund
Date:
On Monday, October 15, 2012 07:03:35 PM Simon Riggs wrote:
> On 15 October 2012 15:19, Andres Freund said...
> 
> > I vote for at least logging a wal record when a hash index is modified
> > which uses incomplete actions to set 'indisready = false' in case its
> > replayed. That should only use a rather minor amount of code and should
> > help users to find problems faster.
> 
> Good idea, though might be harder than it first appears.

> How do we issue just one of those per checkpoint, to minimise WAL?

I was thinking per checkpoint, per backend in order to not add any new locks.

> How do we make that change with a physical update WAL? Non-transactional
> update? During recovery?

Thats why I suggested using the incomplete actions/cleanup stuff, so we can do 
the change when replay finished. Thats not enough for HS though... Can we get 
away with putting a if (RecoveryInProgress()) ereport(...) in there?

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 15 October 2012 18:07, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I don't think I'd go so far as to say that we should
>>> imply that they'll be removed in a future release.  Given how deeply
>>> intertwined they are with the planner, I doubt that that will happen;
>>> and I think there is enough interest in the technology that it's
>>> likely to eventually be fixed.
>>
>> Hash indexes aren't used in the planner. Hash joins use completely
>> separate code.
>
> It's not really completely separate, because to do a hash join we have
> to find a hash function for the relevant data types, and IIUC we do
> that by looking up the default hash opclass for the datatype and
> finding its first support function.  Of course, if we were to remove
> the hash AM, then you couldn't define a hash opclass against it.

Presumably it defaults to hash_any() but I get the picture.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
"ktm@rice.edu"
Date:
On Mon, Oct 15, 2012 at 10:13:24AM -0400, Robert Haas wrote:
> On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > * Put WARNINGs in the docs against the use of hash indexes, backpatch
> > to 8.3. CREATE INDEX gives no warning currently, though Index Types
> > does mention a caution.
> 
> I'd be in favor of adding such warnings to the documentation if they
> are not there already, and possibly even warning on CREATE INDEX ..
> USING hash().  I don't think I'd go so far as to say that we should
> imply that they'll be removed in a future release.  Given how deeply
> intertwined they are with the planner, I doubt that that will happen;
> and I think there is enough interest in the technology that it's
> likely to eventually be fixed.
> 
+1 for adding more warnings but do not deprecate them.

Regards,
Ken



Re: Deprecating Hash Indexes

From
Josh Berkus
Date:
Simon,

> * Put WARNINGs in the docs against the use of hash indexes, backpatch
> to 8.3. CREATE INDEX gives no warning currently, though Index Types
> does mention a caution.

I'd be in favor of a warning on create index.

Also, are hash indexes replicated?

> * Mention in the current docs that hash indexes are likely to be
> deprecated completely in future releases. Should anybody ever make
> them work, we can change that advice quickly but I don't think we're
> going to.

I'm not sure that's true, necessarily.  The nice thing about work on
hash indexes is that it's potentially rather self-contained, i.e. a good
GSOC project.  However ...

> Personally, I would like to see them removed into a contrib module to
> allow people to re-add them if they understand the risks. ISTM better
> to confiscate all foot-guns before they go off and then re-issue them
> to marksmen, at the risk of annoying the people that use them with
> full knowledge but that's likely a contentious issue.

I would be in favor of moving them to contrib for 9.4.  Assuming that
someone can figure out how this interacts with the existing system table
opclasses.  Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.

So, +1 with modifications ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Deprecating Hash Indexes

From
Andres Freund
Date:
On Monday, October 15, 2012 08:14:51 PM Josh Berkus wrote:
> > * Put WARNINGs in the docs against the use of hash indexes, backpatch
> > to 8.3. CREATE INDEX gives no warning currently, though Index Types
> > does mention a caution.
> 
> I'd be in favor of a warning on create index.
> 
> Also, are hash indexes replicated?

No. As they aren't WAL logged they can't be transported via wal based 
replication (PITR/HS/SR). Which rather quickly results in a very broken setup, 
there has been at least one bug on -bugs because of this and there have been 
several people in the irc channel experiencing this.

> > * Mention in the current docs that hash indexes are likely to be
> > deprecated completely in future releases. Should anybody ever make
> > them work, we can change that advice quickly but I don't think we're
> > going to.
> 
> I'm not sure that's true, necessarily.  The nice thing about work on
> hash indexes is that it's potentially rather self-contained, i.e. a good
> GSOC project.  However ...

While self contained I fear you still need quite a bit more knowledge than 
usual students have.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Deprecating Hash Indexes

From
Jeff Janes
Date:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
> I would be in favor of moving them to contrib for 9.4.  Assuming that
> someone can figure out how this interacts with the existing system table
> opclasses.  Them being in /contrib would also put less pressure on the
> next new hacker who decides to take them on as a feature; they can
> improve them incrementally without needing to fix 100% of issues in the
> first go.

Is there anything currently in contrib that defines its own WAL
records and replay methods?  Are there hooks for doing so?

Cheers,

Jeff



Re: Deprecating Hash Indexes

From
Andres Freund
Date:
On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
> > I would be in favor of moving them to contrib for 9.4.  Assuming that
> > someone can figure out how this interacts with the existing system table
> > opclasses.  Them being in /contrib would also put less pressure on the
> > next new hacker who decides to take them on as a feature; they can
> > improve them incrementally without needing to fix 100% of issues in the
> > first go.
> 
> Is there anything currently in contrib that defines its own WAL
> records and replay methods?  Are there hooks for doing so?

It's not really possible as rmgr.c declares a const array of resource managers. 
A contrib module can't sensibly add itself to that. I think changing this has 
been discussed/proposed in the past, but -hackers wasn't convinced...

But then, the idea is to add it to -contrib while no WAL support exists..

Personally I don't see a point in -contrib'ing it. I would rather see it throw 
errors in dangerous situations and be done with that.

Regards,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 15 October 2012 19:46, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>
>> I would be in favor of moving them to contrib for 9.4.  Assuming that
>> someone can figure out how this interacts with the existing system table
>> opclasses.  Them being in /contrib would also put less pressure on the
>> next new hacker who decides to take them on as a feature; they can
>> improve them incrementally without needing to fix 100% of issues in the
>> first go.
>
> Is there anything currently in contrib that defines its own WAL
> records and replay methods?  Are there hooks for doing so?

Not to date. Search for rmgr plugins for previous discussions.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Jeff Janes
Date:
On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
>> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> > I would be in favor of moving them to contrib for 9.4.  Assuming that
>> > someone can figure out how this interacts with the existing system table
>> > opclasses.  Them being in /contrib would also put less pressure on the
>> > next new hacker who decides to take them on as a feature; they can
>> > improve them incrementally without needing to fix 100% of issues in the
>> > first go.
>>
>> Is there anything currently in contrib that defines its own WAL
>> records and replay methods?  Are there hooks for doing so?
>
> It's not really possible as rmgr.c declares a const array of resource managers.
> A contrib module can't sensibly add itself to that. I think changing this has
> been discussed/proposed in the past, but -hackers wasn't convinced...
>
> But then, the idea is to add it to -contrib while no WAL support exists..

Which then virtually guarantees that WAL support never will exist, doesn't it?

> Personally I don't see a point in -contrib'ing it. I would rather see it throw
> errors in dangerous situations and be done with that.

+1

Cheers,

Jeff



Re: Deprecating Hash Indexes

From
"ktm@rice.edu"
Date:
On Mon, Oct 15, 2012 at 11:46:40AM -0700, Jeff Janes wrote:
> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
> >
> > I would be in favor of moving them to contrib for 9.4.  Assuming that
> > someone can figure out how this interacts with the existing system table
> > opclasses.  Them being in /contrib would also put less pressure on the
> > next new hacker who decides to take them on as a feature; they can
> > improve them incrementally without needing to fix 100% of issues in the
> > first go.
> 
> Is there anything currently in contrib that defines its own WAL
> records and replay methods?  Are there hooks for doing so?
> 
> Cheers,
> 
> Jeff
> 

That is a good point. Please do not move it to contrib if that will make
it even harder/impossible to add WAL support.

Regards,
Ken



Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 15 October 2012 20:04, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
>>> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> > I would be in favor of moving them to contrib for 9.4.  Assuming that
>>> > someone can figure out how this interacts with the existing system table
>>> > opclasses.  Them being in /contrib would also put less pressure on the
>>> > next new hacker who decides to take them on as a feature; they can
>>> > improve them incrementally without needing to fix 100% of issues in the
>>> > first go.
>>>
>>> Is there anything currently in contrib that defines its own WAL
>>> records and replay methods?  Are there hooks for doing so?
>>
>> It's not really possible as rmgr.c declares a const array of resource managers.
>> A contrib module can't sensibly add itself to that. I think changing this has
>> been discussed/proposed in the past, but -hackers wasn't convinced...
>>
>> But then, the idea is to add it to -contrib while no WAL support exists..
>
> Which then virtually guarantees that WAL support never will exist, doesn't it?
>
>> Personally I don't see a point in -contrib'ing it. I would rather see it throw
>> errors in dangerous situations and be done with that.
>
> +1

All I'm planning to do for now is add docs and the WARNING suggested.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Deprecating Hash Indexes

From
Peter Eisentraut
Date:
On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote:
> I'd be in favor of a warning on create index.

Only if you can turn it off, please.

But I don't think a warning is appropriate if the statement does exactly
what the user wanted.  The place to point out shortcomings of the
implementation is in the documentation.




Re: Deprecating Hash Indexes

From
Simon Riggs
Date:
On 16 October 2012 04:49, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote:
>> I'd be in favor of a warning on create index.
>
> Only if you can turn it off, please.
>
> But I don't think a warning is appropriate if the statement does exactly
> what the user wanted.  The place to point out shortcomings of the
> implementation is in the documentation.

Fair comment. I've just added a caution in the docs.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services