Thread: Deprecating Hash Indexes
As discussed on other threads, Hash Indexes are currently a broken feature and bring us into disrepute. I describe them as broken because they are neither crash safe, nor could they properly be called unlogged indexes (or if so, why just hash?). And also because if somebody suggested implementing them the way they are now, they would be told they were insane because silent data corruption is not something we tolerate anymore. We know why they are like that, but its time to remove the rest of the historical research legacy. It's hard to say "We respect your data [except if you press here]" and be taken seriously. Suggested actions are * Put WARNINGs in the docs against the use of hash indexes, backpatch to 8.3. CREATE INDEX gives no warning currently, though Index Types does mention a caution. * Mention in the current docs that hash indexes are likely to be deprecated completely in future releases. Should anybody ever make them work, we can change that advice quickly but I don't think we're going to. Personally, I would like to see them removed into a contrib module to allow people to re-add them if they understand the risks. ISTM better to confiscate all foot-guns before they go off and then re-issue them to marksmen, at the risk of annoying the people that use them with full knowledge but that's likely a contentious issue. Alternate views? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/14/2012 09:45 AM, Simon Riggs wrote: > As discussed on other threads, Hash Indexes are currently a broken > feature and bring us into disrepute. > > I describe them as broken because they are neither crash safe, nor > could they properly be called unlogged indexes (or if so, why just > hash?). And also because if somebody suggested implementing them the > way they are now, they would be told they were insane because silent > data corruption is not something we tolerate anymore. We know why they > are like that, but its time to remove the rest of the historical > research legacy. It's hard to say "We respect your data [except if you > press here]" and be taken seriously. > > Suggested actions are > > * Put WARNINGs in the docs against the use of hash indexes, backpatch > to 8.3. CREATE INDEX gives no warning currently, though Index Types > does mention a caution. > > * Mention in the current docs that hash indexes are likely to be > deprecated completely in future releases. Should anybody ever make > them work, we can change that advice quickly but I don't think we're > going to. > > Personally, I would like to see them removed into a contrib module to > allow people to re-add them if they understand the risks. ISTM better > to confiscate all foot-guns before they go off and then re-issue them > to marksmen, at the risk of annoying the people that use them with > full knowledge but that's likely a contentious issue. > > Alternate views? I think we'd be better off to start by implementing unlogged indexes generally. I have a client who is using hash indexes for the performance gain on a very heavy insert load precisely because they are unlogged, and who can get away with it because all their index lookup requirements are equality. They are aware of the consequences of a crash, and can manage the risk accordingly. But there is a lot of attraction in the idea of unlogged btree indexes for example. And the danger of an unlogged index is substantially less than that of an unlogged table. After all, your data is still there, quite crash-safe, and you can thus just rebuild the index after a crash. cheers andrew
Simon Riggs <simon@2ndQuadrant.com> writes: > As discussed on other threads, Hash Indexes are currently a broken > feature and bring us into disrepute. Yeah ... > Suggested actions are I find it curious that you don't even bother to list "add WAL support to hash indexes" as a possible solution. It's certainly doable, and probably not even very much more work than something like moving hash support to a contrib module would be. (Assuming that's even possible, which I doubt, because the core code relies on hash index opclasses even if not on hash indexes.) regards, tom lane
On 14 October 2012 17:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> As discussed on other threads, Hash Indexes are currently a broken >> feature and bring us into disrepute. > > Yeah ... > >> Suggested actions are > > I find it curious that you don't even bother to list "add WAL support to > hash indexes" as a possible solution. It's certainly doable, and > probably not even very much more work than something like moving hash > support to a contrib module would be. (Assuming that's even possible, > which I doubt, because the core code relies on hash index opclasses even > if not on hash indexes.) It is doable, yes. I think that's at least 2-3 months total work, much of it low-level bug fixing and eventual production bug hunting. Its gone for 9.3 already, so the earliest this would see production code is Sept 2014. It's a task beyond most people's ability and beyond the range of professionals without funding. I don't see anyone funding a medium-low importance feature ahead of the higher ones out there, so I don't expect the situation to change for (more) years. If you personally have time to spare on performance features, there are many optimizer tasks more worthy of your attention and more popular also, so I don't ask for a code sprint on this. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > * Put WARNINGs in the docs against the use of hash indexes, backpatch > to 8.3. CREATE INDEX gives no warning currently, though Index Types > does mention a caution. I'd be in favor of adding such warnings to the documentation if they are not there already, and possibly even warning on CREATE INDEX .. USING hash(). I don't think I'd go so far as to say that we should imply that they'll be removed in a future release. Given how deeply intertwined they are with the planner, I doubt that that will happen; and I think there is enough interest in the technology that it's likely to eventually be fixed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sunday, October 14, 2012 03:45:49 PM Simon Riggs wrote: > As discussed on other threads, Hash Indexes are currently a broken > feature and bring us into disrepute. > > I describe them as broken because they are neither crash safe, nor > could they properly be called unlogged indexes (or if so, why just > hash?). And also because if somebody suggested implementing them the > way they are now, they would be told they were insane because silent > data corruption is not something we tolerate anymore. We know why they > are like that, but its time to remove the rest of the historical > research legacy. It's hard to say "We respect your data [except if you > press here]" and be taken seriously. > > Suggested actions are > > * Put WARNINGs in the docs against the use of hash indexes, backpatch > to 8.3. CREATE INDEX gives no warning currently, though Index Types > does mention a caution. > > * Mention in the current docs that hash indexes are likely to be > deprecated completely in future releases. Should anybody ever make > them work, we can change that advice quickly but I don't think we're > going to. > > Personally, I would like to see them removed into a contrib module to > allow people to re-add them if they understand the risks. ISTM better > to confiscate all foot-guns before they go off and then re-issue them > to marksmen, at the risk of annoying the people that use them with > full knowledge but that's likely a contentious issue. > > Alternate views? I vote for at least logging a wal record when a hash index is modified which uses incomplete actions to set 'indisready = false' in case its replayed. That should only use a rather minor amount of code and should help users to find problems faster. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 15:13, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> * Put WARNINGs in the docs against the use of hash indexes, backpatch >> to 8.3. CREATE INDEX gives no warning currently, though Index Types >> does mention a caution. > > I'd be in favor of adding such warnings to the documentation if they > are not there already, and possibly even warning on CREATE INDEX .. > USING hash(). Sounds like a good idea. > I don't think I'd go so far as to say that we should > imply that they'll be removed in a future release. Given how deeply > intertwined they are with the planner, I doubt that that will happen; > and I think there is enough interest in the technology that it's > likely to eventually be fixed. Hash indexes aren't used in the planner. Hash joins use completely separate code. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 15:19, Andres Freund said... > I vote for at least logging a wal record when a hash index is modified which > uses incomplete actions to set 'indisready = false' in case its replayed. That > should only use a rather minor amount of code and should help users to find > problems faster. Good idea, though might be harder than it first appears. How do we issue just one of those per checkpoint, to minimise WAL? How do we make that change with a physical update WAL? Non-transactional update? During recovery? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I don't think I'd go so far as to say that we should >> imply that they'll be removed in a future release. Given how deeply >> intertwined they are with the planner, I doubt that that will happen; >> and I think there is enough interest in the technology that it's >> likely to eventually be fixed. > > Hash indexes aren't used in the planner. Hash joins use completely > separate code. It's not really completely separate, because to do a hash join we have to find a hash function for the relevant data types, and IIUC we do that by looking up the default hash opclass for the datatype and finding its first support function. Of course, if we were to remove the hash AM, then you couldn't define a hash opclass against it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Monday, October 15, 2012 07:03:35 PM Simon Riggs wrote: > On 15 October 2012 15:19, Andres Freund said... > > > I vote for at least logging a wal record when a hash index is modified > > which uses incomplete actions to set 'indisready = false' in case its > > replayed. That should only use a rather minor amount of code and should > > help users to find problems faster. > > Good idea, though might be harder than it first appears. > How do we issue just one of those per checkpoint, to minimise WAL? I was thinking per checkpoint, per backend in order to not add any new locks. > How do we make that change with a physical update WAL? Non-transactional > update? During recovery? Thats why I suggested using the incomplete actions/cleanup stuff, so we can do the change when replay finished. Thats not enough for HS though... Can we get away with putting a if (RecoveryInProgress()) ereport(...) in there? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 18:07, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> I don't think I'd go so far as to say that we should >>> imply that they'll be removed in a future release. Given how deeply >>> intertwined they are with the planner, I doubt that that will happen; >>> and I think there is enough interest in the technology that it's >>> likely to eventually be fixed. >> >> Hash indexes aren't used in the planner. Hash joins use completely >> separate code. > > It's not really completely separate, because to do a hash join we have > to find a hash function for the relevant data types, and IIUC we do > that by looking up the default hash opclass for the datatype and > finding its first support function. Of course, if we were to remove > the hash AM, then you couldn't define a hash opclass against it. Presumably it defaults to hash_any() but I get the picture. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 10:13:24AM -0400, Robert Haas wrote: > On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > * Put WARNINGs in the docs against the use of hash indexes, backpatch > > to 8.3. CREATE INDEX gives no warning currently, though Index Types > > does mention a caution. > > I'd be in favor of adding such warnings to the documentation if they > are not there already, and possibly even warning on CREATE INDEX .. > USING hash(). I don't think I'd go so far as to say that we should > imply that they'll be removed in a future release. Given how deeply > intertwined they are with the planner, I doubt that that will happen; > and I think there is enough interest in the technology that it's > likely to eventually be fixed. > +1 for adding more warnings but do not deprecate them. Regards, Ken
Simon, > * Put WARNINGs in the docs against the use of hash indexes, backpatch > to 8.3. CREATE INDEX gives no warning currently, though Index Types > does mention a caution. I'd be in favor of a warning on create index. Also, are hash indexes replicated? > * Mention in the current docs that hash indexes are likely to be > deprecated completely in future releases. Should anybody ever make > them work, we can change that advice quickly but I don't think we're > going to. I'm not sure that's true, necessarily. The nice thing about work on hash indexes is that it's potentially rather self-contained, i.e. a good GSOC project. However ... > Personally, I would like to see them removed into a contrib module to > allow people to re-add them if they understand the risks. ISTM better > to confiscate all foot-guns before they go off and then re-issue them > to marksmen, at the risk of annoying the people that use them with > full knowledge but that's likely a contentious issue. I would be in favor of moving them to contrib for 9.4. Assuming that someone can figure out how this interacts with the existing system table opclasses. Them being in /contrib would also put less pressure on the next new hacker who decides to take them on as a feature; they can improve them incrementally without needing to fix 100% of issues in the first go. So, +1 with modifications ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Monday, October 15, 2012 08:14:51 PM Josh Berkus wrote: > > * Put WARNINGs in the docs against the use of hash indexes, backpatch > > to 8.3. CREATE INDEX gives no warning currently, though Index Types > > does mention a caution. > > I'd be in favor of a warning on create index. > > Also, are hash indexes replicated? No. As they aren't WAL logged they can't be transported via wal based replication (PITR/HS/SR). Which rather quickly results in a very broken setup, there has been at least one bug on -bugs because of this and there have been several people in the irc channel experiencing this. > > * Mention in the current docs that hash indexes are likely to be > > deprecated completely in future releases. Should anybody ever make > > them work, we can change that advice quickly but I don't think we're > > going to. > > I'm not sure that's true, necessarily. The nice thing about work on > hash indexes is that it's potentially rather self-contained, i.e. a good > GSOC project. However ... While self contained I fear you still need quite a bit more knowledge than usual students have. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: > > I would be in favor of moving them to contrib for 9.4. Assuming that > someone can figure out how this interacts with the existing system table > opclasses. Them being in /contrib would also put less pressure on the > next new hacker who decides to take them on as a feature; they can > improve them incrementally without needing to fix 100% of issues in the > first go. Is there anything currently in contrib that defines its own WAL records and replay methods? Are there hooks for doing so? Cheers, Jeff
On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote: > On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: > > I would be in favor of moving them to contrib for 9.4. Assuming that > > someone can figure out how this interacts with the existing system table > > opclasses. Them being in /contrib would also put less pressure on the > > next new hacker who decides to take them on as a feature; they can > > improve them incrementally without needing to fix 100% of issues in the > > first go. > > Is there anything currently in contrib that defines its own WAL > records and replay methods? Are there hooks for doing so? It's not really possible as rmgr.c declares a const array of resource managers. A contrib module can't sensibly add itself to that. I think changing this has been discussed/proposed in the past, but -hackers wasn't convinced... But then, the idea is to add it to -contrib while no WAL support exists.. Personally I don't see a point in -contrib'ing it. I would rather see it throw errors in dangerous situations and be done with that. Regards, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 19:46, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: >> >> I would be in favor of moving them to contrib for 9.4. Assuming that >> someone can figure out how this interacts with the existing system table >> opclasses. Them being in /contrib would also put less pressure on the >> next new hacker who decides to take them on as a feature; they can >> improve them incrementally without needing to fix 100% of issues in the >> first go. > > Is there anything currently in contrib that defines its own WAL > records and replay methods? Are there hooks for doing so? Not to date. Search for rmgr plugins for previous discussions. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote: >> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: >> > I would be in favor of moving them to contrib for 9.4. Assuming that >> > someone can figure out how this interacts with the existing system table >> > opclasses. Them being in /contrib would also put less pressure on the >> > next new hacker who decides to take them on as a feature; they can >> > improve them incrementally without needing to fix 100% of issues in the >> > first go. >> >> Is there anything currently in contrib that defines its own WAL >> records and replay methods? Are there hooks for doing so? > > It's not really possible as rmgr.c declares a const array of resource managers. > A contrib module can't sensibly add itself to that. I think changing this has > been discussed/proposed in the past, but -hackers wasn't convinced... > > But then, the idea is to add it to -contrib while no WAL support exists.. Which then virtually guarantees that WAL support never will exist, doesn't it? > Personally I don't see a point in -contrib'ing it. I would rather see it throw > errors in dangerous situations and be done with that. +1 Cheers, Jeff
On Mon, Oct 15, 2012 at 11:46:40AM -0700, Jeff Janes wrote: > On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: > > > > I would be in favor of moving them to contrib for 9.4. Assuming that > > someone can figure out how this interacts with the existing system table > > opclasses. Them being in /contrib would also put less pressure on the > > next new hacker who decides to take them on as a feature; they can > > improve them incrementally without needing to fix 100% of issues in the > > first go. > > Is there anything currently in contrib that defines its own WAL > records and replay methods? Are there hooks for doing so? > > Cheers, > > Jeff > That is a good point. Please do not move it to contrib if that will make it even harder/impossible to add WAL support. Regards, Ken
On 15 October 2012 20:04, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote: >>> On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote: >>> > I would be in favor of moving them to contrib for 9.4. Assuming that >>> > someone can figure out how this interacts with the existing system table >>> > opclasses. Them being in /contrib would also put less pressure on the >>> > next new hacker who decides to take them on as a feature; they can >>> > improve them incrementally without needing to fix 100% of issues in the >>> > first go. >>> >>> Is there anything currently in contrib that defines its own WAL >>> records and replay methods? Are there hooks for doing so? >> >> It's not really possible as rmgr.c declares a const array of resource managers. >> A contrib module can't sensibly add itself to that. I think changing this has >> been discussed/proposed in the past, but -hackers wasn't convinced... >> >> But then, the idea is to add it to -contrib while no WAL support exists.. > > Which then virtually guarantees that WAL support never will exist, doesn't it? > >> Personally I don't see a point in -contrib'ing it. I would rather see it throw >> errors in dangerous situations and be done with that. > > +1 All I'm planning to do for now is add docs and the WARNING suggested. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote: > I'd be in favor of a warning on create index. Only if you can turn it off, please. But I don't think a warning is appropriate if the statement does exactly what the user wanted. The place to point out shortcomings of the implementation is in the documentation.
On 16 October 2012 04:49, Peter Eisentraut <peter_e@gmx.net> wrote: > On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote: >> I'd be in favor of a warning on create index. > > Only if you can turn it off, please. > > But I don't think a warning is appropriate if the statement does exactly > what the user wanted. The place to point out shortcomings of the > implementation is in the documentation. Fair comment. I've just added a caution in the docs. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services