Thread: Preventing index scans for non-recoverable index AMs
Hot Standby won't work with hash indexes because they are non-recoverable. We have a number of ways of dealing with this: 1. Workaround: Implement WAL for hash indexes 2. Specific Solution: make hashcostestimate() massively increase cost of scans during recovery so that they will very seldom be picked and make hashbeginscan() refuse scans during recovery in case they do happen 3. Generic Solution: add amisrecoverable flag to pg_am, and alter generic index AM to check whether index is recoverable before selecting it during planning I was hoping to do (1), but I'm looking for a quicker solution now so we can tie up loose ends for review. (2) seems most appropriate, since it will last only until (1) is complete in a later release. Anyone see additional options or prefer alternatives? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Hot Standby won't work with hash indexes because they are > non-recoverable. > > We have a number of ways of dealing with this: > i don't see a reason for inventing the wheel, we don't have wal for hash indexes because makes those more slow without any benefit at all... now there will be one... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Dec 17, 2008 at 05:42:41PM -0500, Jaime Casanova wrote: > On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Hot Standby won't work with hash indexes because they are > > non-recoverable. > > > > We have a number of ways of dealing with this: > > > > i don't see a reason for inventing the wheel, we don't have wal for > hash indexes because makes those more slow without any benefit at > all... now there will be one... > > -- > Atentamente, > Jaime Casanova > Soporte y capacitaci?n de PostgreSQL > Asesor?a y desarrollo de sistemas > Guayaquil - Ecuador > Cel. +59387171157 > I think having your index survive a server power outage or other crash is a very good thing. Rebuilding a hash index for the case for which it is preferred (large, large tables) would be excrutiating. Ken
On Wed, 2008-12-17 at 17:42 -0500, Jaime Casanova wrote: > On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Hot Standby won't work with hash indexes because they are > > non-recoverable. > > > > We have a number of ways of dealing with this: > > > > i don't see a reason for inventing the wheel, we don't have wal for > hash indexes because makes those more slow without any benefit at > all... now there will be one... Well, we're running short of time for 8.4 to put it mildly, so option (1) is not on my radar. Even if somebody wrote WAL support for hash indexes right now, I would be much happier with my other two suggestions from a robustness perspective. We don't yet have a mechanism for an index AM to say "damn, this index is screwed up, don't use it". So a rushed implementation of WAL support would be counterproductive, ISTM. So it's either (2), (3) or another option. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 16:47 -0600, Kenneth Marshall wrote: > I think having your index survive a server power outage or other > crash is a very good thing. Rebuilding a hash index for the case > for which it is preferred (large, large tables) would be excrutiating. Completely agree. We may be outta time to make it happen. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > We don't yet have a mechanism for an > index AM to say "damn, this index is screwed up, don't use it". > mark pg_index.indisvalid and/or pg_index.indisready to false in the hot standby node? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Dec 17, 2008 at 10:58:11PM +0000, Simon Riggs wrote: > > On Wed, 2008-12-17 at 16:47 -0600, Kenneth Marshall wrote: > > > I think having your index survive a server power outage or other > > crash is a very good thing. Rebuilding a hash index for the case > > for which it is preferred (large, large tables) would be excrutiating. > > Completely agree. > > We may be outta time to make it happen. > I agree. I was working on adding the WAL and ran up against the deadline. A rushed hash WAL implementation would be worse than the other alternatives. I plan on picking it back up after 8.4 is out the door. Regards, Ken
On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: > Rebuilding a hash index for the case > for which it is preferred (large, large tables) would be excrutiating. > there's such a situation? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: > > Rebuilding a hash index for the case > > for which it is preferred (large, large tables) would be excrutiating. > > > > there's such a situation? > As of 8.4, yes. Ken
On Wed, Dec 17, 2008 at 05:10:40PM -0600, Kenneth Marshall wrote: > On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: > > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: > > > Rebuilding a hash index for the case > > > for which it is preferred (large, large tables) would be excrutiating. > > > > > > > there's such a situation? > > > As of 8.4, yes. > In addition, hash indexes can index items larger than the 1/3 page limit of btree indexes. Ken
On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote: > On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > We don't yet have a mechanism for an > > index AM to say "damn, this index is screwed up, don't use it". > > > > mark pg_index.indisvalid and/or pg_index.indisready to false in the > hot standby node? We can't edit the database until recovery is over, so that doesn't help us while in recovery mode. So not an option. It doesn't help us after recovery mode either because there is no infrastructure (yet) for an index AM's rmgr to exact a transaction after completion of recovery. So yes, that is the way it will be done, but there is a non-trivial effort to allow it to happen. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote: > > On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > > We don't yet have a mechanism for an > > > index AM to say "damn, this index is screwed up, don't use it". > > > > mark pg_index.indisvalid and/or pg_index.indisready to false in the > > hot standby node? > > We can't edit the database until recovery is over, so that doesn't help > us while in recovery mode. So not an option. Maybe we should add a WAL record that's the physical representation for "mark this index invalid", and have any transaction that modifies a hash index write that to WAL. It should be simple code to write, because the underlying replay is based on a regular heap update. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, 2008-12-17 at 21:23 -0300, Alvaro Herrera wrote: > Simon Riggs wrote: > > > > On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote: > > > On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > > > > We don't yet have a mechanism for an > > > > index AM to say "damn, this index is screwed up, don't use it". > > > > > > mark pg_index.indisvalid and/or pg_index.indisready to false in the > > > hot standby node? > > > > We can't edit the database until recovery is over, so that doesn't help > > us while in recovery mode. So not an option. > > Maybe we should add a WAL record that's the physical representation for > "mark this index invalid", and have any transaction that modifies a hash > index write that to WAL. It should be simple code to write, because > the underlying replay is based on a regular heap update. Doesn't sound like it would work. It doesn't really matter how you *decide* to do this, it's when you do this that counts. What we need is a way for recovery to remember a list of pending actions that can then be issued in a transaction at the end of recovery. As you pointed out, the Startup process cannot issue transactions, so that means this is harder than it should be. Short route is to: * allow Startup process to run transactions (when recovery finished) * introduce another rmgr API call that gets called in its own transaction at the end of recovery Longer route is to * allow startup process to queue up work following recovery * have another process (autovac-ish) get spawned immediately after recovery to read the list and execute - this would allow us to startup quickly even if the rmgr decided to mark index invalid and then completely rebuild the index. Which sounds like a major project in itself. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
All, Are we really releasing an index type without recoverability for 8.4? Will this be in /contrib? --Josh
On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote: > On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: > > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: > > > Rebuilding a hash index for the case > > > for which it is preferred (large, large tables) would be excrutiating. > > > > > > > there's such a situation? > > > As of 8.4, yes. > My understanding was that the hash index type never supported recoverability, and could require a rebuild on power failure. If it's not written to WAL before the data page changes, how could it be safe for recovery? The tuple inserts are logged, so during recovery the tuple would be put in the table but the index would not be updated. What am I missing? Regards,Jeff Davis
On Wed, 2008-12-17 at 18:09 -0800, Josh Berkus wrote: > Are we really releasing an index type without recoverability for 8.4? > Will this be in /contrib? Worse than that, I'm talking about hash indexes (which are already here, without WAL) I would not contemplate a new index type without WAL. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 21:26 +0000, Simon Riggs wrote: > Hot Standby won't work with hash indexes because they are > non-recoverable. > > We have a number of ways of dealing with this: > 2. Specific Solution: make hashcostestimate() massively increase cost of > scans during recovery so that they will very seldom be picked and make > hashbeginscan() refuse scans during recovery in case they do happen > (2) seems most appropriate, since it will last only until (1) is > complete in a later release. Code *fragment* enclosed here for discussion. (Not an independent patch) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Attachment
On Wed, 2008-12-17 at 18:20 -0800, Jeff Davis wrote: > On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote: > > On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: > > > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: > > > > Rebuilding a hash index for the case > > > > for which it is preferred (large, large tables) would be excrutiating. > > > > > > > > > > there's such a situation? > > > > > As of 8.4, yes. > > > > My understanding was that the hash index type never supported > recoverability, and could require a rebuild on power failure. > > If it's not written to WAL before the data page changes, how could it be > safe for recovery? The tuple inserts are logged, so during recovery the > tuple would be put in the table but the index would not be updated. > > What am I missing? > On second read, it occurs to me that you may have meant: "as of 8.4, hash indexes have never been safe" but I read it as: "as of 8.4, hash indexes will require rebuild on crash, whereas that was unnecessary before 8.4". If you meant the former, you can disregard my question. Regards,Jeff Davis
Simon Riggs wrote: > > On Wed, 2008-12-17 at 21:23 -0300, Alvaro Herrera wrote: > > Maybe we should add a WAL record that's the physical representation for > > "mark this index invalid", and have any transaction that modifies a hash > > index write that to WAL. It should be simple code to write, because > > the underlying replay is based on a regular heap update. > > Doesn't sound like it would work. It doesn't really matter how you > *decide* to do this, it's when you do this that counts. Hmm, it doesn't seem like you understood my suggestion ... basically I'm saying that a hash index insert/delete should put out this WAL record: HEAP update address-of-pg_index-tuple set indisvalid=false (I'm just guessing at indisvalid but you get my point) No need to remember anything. Of course, the user then needs to fix the index after the fact. Of course, for 8.5 we would do something smarter. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, 2008-12-17 at 23:28 -0300, Alvaro Herrera wrote: > Hmm, it doesn't seem like you understood my suggestion ... basically I'm > saying that a hash index insert/delete should put out this WAL record: > > HEAP update address-of-pg_index-tuple set indisvalid=false > > (I'm just guessing at indisvalid but you get my point) That would be simple and I'm very sorry to say I still don't think it would work. But yes, I did misunderstand you. In-progress hash index scans would not be prevented from executing by the WAL record, so you might end up following a bad pointer. We probably wouldn't want to try killing anybody using index either, since that would end up as a complete bloodbath. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Jeff Davis wrote: > On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote: >> On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: >>> On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm@rice.edu> wrote: >>>> Rebuilding a hash index for the case >>>> for which it is preferred (large, large tables) would be excrutiating. >>>> >>> there's such a situation? >>> >> As of 8.4, yes. > > My understanding was that the hash index type never supported > recoverability, and could require a rebuild on power failure. Right, this is certainly not a new problem. It's not even a new problem in the context of replication or hot standby, because we already have the problem with PITR and file-based log shipping. Also, it's not just a problem *during* the recovery. The index is just as corrupt after the recovery has finished. I think we should just leave it alone for 8.4, and fix it properly in a future relase by implementing WAL-logging for hash indexes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2008-12-18 at 08:29 +0200, Heikki Linnakangas wrote: > Right, this is certainly not a new problem. It's not even a new problem > in the context of replication or hot standby, because we already have > the problem with PITR and file-based log shipping. > > Also, it's not just a problem *during* the recovery. The index is just > as corrupt after the recovery has finished. Agreed. > I think we should just leave it alone for 8.4, and fix it properly in a > future relase by implementing WAL-logging for hash indexes. You really think we should just leave it alone? That gives me less work, so I will accept that if you think so. Gives me the shivers though. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: >> I think we should just leave it alone for 8.4, and fix it properly in a >> future relase by implementing WAL-logging for hash indexes. > > You really think we should just leave it alone? That gives me less work, > so I will accept that if you think so. Gives me the shivers though. At least you ought to feel free to treat it as an independent problem from your current project.
On Thu, 2008-12-18 at 13:52 +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > >> I think we should just leave it alone for 8.4, and fix it properly in a > >> future relase by implementing WAL-logging for hash indexes. > > > > You really think we should just leave it alone? That gives me less work, > > so I will accept that if you think so. Gives me the shivers though. > > At least you ought to feel free to treat it as an independent problem > from your current project. OK, good. Some weight off my shoulders :-) -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, Dec 18, 2008 at 11:59 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > Right, this is certainly not a new problem. It's not even a new problem in > the context of replication or hot standby, because we already have the > problem with PITR and file-based log shipping. > > Also, it's not just a problem *during* the recovery. The index is just as > corrupt after the recovery has finished. > Just curious, how do we handle the case of corrupted hash index today ? If we can detect that the index is corrupt because of bad page headers etc, then its still OK; we can throw an error. But what if the hash index is used after recovery and it returns wrong tuple(s) ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > Just curious, how do we handle the case of corrupted hash index today? We don't. > If we can detect that the index is corrupt because of bad page > headers etc, then its still OK; we can throw an error. But what if the > hash index is used after recovery and it returns wrong tuple(s) ? You get to keep both pieces.. In short, don't use hash index, unless you're prepared to run REINDEX manually after every crash. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Dec 18, 2008 at 6:02 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > In short, don't use hash index, unless you're prepared to run REINDEX > manually after every crash. > I think that should be mentioned in *bold* letters in the documentation. The doc currently has the following: "so hash indexes might need to be rebuilt with REINDEX after a database crash" This isn't a strong statement. How would we handle automatic recovery where user may not even get chance to run REINDEX before his database is corrupted ? Hot standby will fail miserably with hash indexes since the index would be completely useless at the standby (but planner will nevertheless try to use it), IMHO either hash index should not be supported at all or should be WAL logged and properly handled in presence of hot standby. BTW, if there is no proven case where hash index works significantly better than btree (that's what the doc says), why not just completely abandon it ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > BTW, if there is no proven case where hash index works significantly > better than btree (that's what the doc says), why not just completely > abandon it ? That has been considered many times, see archives. I believe the changes done in 8.4 actually made it faster for some cases. And as Kenneth pointed out hash indexes can handle keys larger than 1/3 of page size, that b-tree can't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
It would be perfectly reasonable to add an amisrecoverable like Simon described. It could automatically set indisvalid to false after a crash and treat the index as if indisvalid is false during recovery. That would be a lot smoother and safer than what we have now. It might even be possible to do this with a new wal record type so it only happens if there was a write to the index. I imagine most users who read that warning and use hash indexes anyways are using them on read-only tables where they know it's safe. -- Greg On 18 Dec 2008, at 07:51, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com > wrote: > Pavan Deolasee wrote: >> BTW, if there is no proven case where hash index works significantly >> better than btree (that's what the doc says), why not just completely >> abandon it ? > > That has been considered many times, see archives. I believe the > changes done in 8.4 actually made it faster for some cases. And as > Kenneth pointed out hash indexes can handle keys larger than 1/3 of > page size, that b-tree can't. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Greg Stark wrote: > It would be perfectly reasonable to add an amisrecoverable like Simon > described. It could automatically set indisvalid to false after a crash > and treat the index as if indisvalid is false during recovery. That > would be a lot smoother and safer than what we have now. > > It might even be possible to do this with a new wal record type so it > only happens if there was a write to the index. I imagine most users who > read that warning and use hash indexes anyways are using them on > read-only tables where they know it's safe. This is essentially Alvaro's suggestions, which Simon has already given a counterargument to.
Peter Eisentraut <peter_e@gmx.net> writes: > Greg Stark wrote: >> It would be perfectly reasonable to add an amisrecoverable like Simon >> described. It could automatically set indisvalid to false after a crash >> and treat the index as if indisvalid is false during recovery. That >> would be a lot smoother and safer than what we have now. >> >> It might even be possible to do this with a new wal record type so it >> only happens if there was a write to the index. I imagine most users who >> read that warning and use hash indexes anyways are using them on >> read-only tables where they know it's safe. > This is essentially Alvaro's suggestions, which Simon has already given > a counterargument to. The long and the short of it is that the reason hash indexes still don't have WAL support is no one's seen fit to do the work. I do not see the point of proposing to expend work to substitute for that work. I think all that ought to be done here is document that hash indexes shouldn't be used in a replication or PITR environment. regards, tom lane