Thread: bug of recovery?
Hi, Currently, if a reference to an invalid page is found during recovery, its information is saved in hash table "invalid_page_tab". Then, if such a reference is resolved, its information is removed from the hash table. If there is unresolved reference to an invalid page in the hash table at the end of recovery, PANIC error occurs. What I'm worried about is that the hash table is volatile. If a user restarts the server before reaching end of recovery, any information in the hash table is lost, and we wrongly miss the PANIC error case because we cannot find any unresolved reference. That is, even if database is corrupted at the end of recovery, a user might not be able to notice that. This looks like a serious problem. No? To prevent the above problem, we should write the contents of the hash table to the disk for every restartpoints, I think. Then, when the server starts recovery, it should reload the hash table from the disk. Thought? Am I missing something? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sep26, 2011, at 11:50 , Fujii Masao wrote: > Currently, if a reference to an invalid page is found during recovery, > its information > is saved in hash table "invalid_page_tab". Then, if such a reference > is resolved, > its information is removed from the hash table. If there is unresolved > reference to > an invalid page in the hash table at the end of recovery, PANIC error occurs. > > What I'm worried about is that the hash table is volatile. If a user restarts > the server before reaching end of recovery, any information in the > hash table is lost, > and we wrongly miss the PANIC error case because we cannot find any unresolved > reference. That is, even if database is corrupted at the end of recovery, > a user might not be able to notice that. This looks like a serious problem. No? > > To prevent the above problem, we should write the contents of the hash table to > the disk for every restartpoints, I think. Then, when the server > starts recovery, > it should reload the hash table from the disk. Thought? Am I missing something? Shouldn't references to invalid pages only occur before we reach a consistent state? If so, the right fix would be to check whether all invalid page references have been resolved after we've reached a consistent state, and to skip creating restart points while there're unresolved page references. best regards, Florian Pflug
On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Currently, if a reference to an invalid page is found during recovery, > its information > is saved in hash table "invalid_page_tab". Then, if such a reference > is resolved, > its information is removed from the hash table. If there is unresolved > reference to > an invalid page in the hash table at the end of recovery, PANIC error occurs. > > What I'm worried about is that the hash table is volatile. If a user restarts > the server before reaching end of recovery, any information in the > hash table is lost, > and we wrongly miss the PANIC error case because we cannot find any unresolved > reference. That is, even if database is corrupted at the end of recovery, > a user might not be able to notice that. This looks like a serious problem. No? > > To prevent the above problem, we should write the contents of the hash table to > the disk for every restartpoints, I think. Then, when the server > starts recovery, > it should reload the hash table from the disk. Thought? Am I missing something? That doesn't happen because the when we stop the server it will restart from a valid restartpoint - one where there is no in-progress multi-phase operation. So when it replays it will always replay both parts of the operation. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> To prevent the above problem, we should write the contents of the hash table to >> the disk for every restartpoints, I think. Then, when the server >> starts recovery, >> it should reload the hash table from the disk. Thought? Am I missing something? > That doesn't happen because the when we stop the server it will > restart from a valid restartpoint - one where there is no in-progress > multi-phase operation. Not clear that that's true. The larger point though is that the invalid-page table is only interesting during crash recovery --- once you've reached a consistent state, it should be empty and remain so. So I see no particular value in Fujii's proposal of logging the table to disk during standby mode. It might be worthwhile to invoke XLogCheckInvalidPages() as soon as we (think we) have reached consistency, rather than leaving it to be done only when we exit recovery mode. regards, tom lane
On Sep26, 2011, at 22:39 , Tom Lane wrote: > It might be worthwhile to invoke XLogCheckInvalidPages() as soon as > we (think we) have reached consistency, rather than leaving it to be > done only when we exit recovery mode. I believe we also need to prevent the creation of restart points before we've reached consistency. If we're starting from an online backup, and a checkpoint occurred between pg_start_backup() and pg_stop_backup(), we currently create a restart point upon replaying that checkpoint's xlog record. At that point, however, unresolved page references are not an error, since a truncation that happened after the checkpoint (but before pg_stop_backup()) might or might not be reflected in the online backup. best regards, Florian Pflug
On Tue, Sep 27, 2011 at 7:28 AM, Florian Pflug <fgp@phlo.org> wrote: > On Sep26, 2011, at 22:39 , Tom Lane wrote: >> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >> we (think we) have reached consistency, rather than leaving it to be >> done only when we exit recovery mode. > > I believe we also need to prevent the creation of restart points before > we've reached consistency. If we're starting from an online backup, > and a checkpoint occurred between pg_start_backup() and pg_stop_backup(), > we currently create a restart point upon replaying that checkpoint's > xlog record. At that point, however, unresolved page references are > not an error, since a truncation that happened after the checkpoint > (but before pg_stop_backup()) might or might not be reflected in the > online backup. Preventing the creation of restartpoints before reaching consistent point sounds fragile to the case where the backup takes very long time. It might also take very long time to reach consistent point when replaying from that backup. Which prevents also the removal of WAL files (e.g., streamed from the master server) for a long time, and then might cause disk full failure. ISTM that writing an invalid-page table to the disk for every restartpoints is better approach. If an invalid-page table is never updated after we've reached consistency point, we probably should make restartpoints write that table only after that point. And, if a reference to an invalid page is found after the consistent point, we should emit error and cancel a recovery. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > ISTM that writing an invalid-page table to the disk for every restartpoints is > better approach. I still say that's uncalled-for overkill. The invalid-page table is not necessary for recovery, it's only a debugging cross-check. You're more likely to introduce bugs than fix any by adding a mechanism like that. regards, tom lane
On Tue, Sep 27, 2011 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> ISTM that writing an invalid-page table to the disk for every restartpoints is >> better approach. > > I still say that's uncalled-for overkill. The invalid-page table is not > necessary for recovery, it's only a debugging cross-check. If so, there is no risk even if the invalid-page table is lost and the check is skipped unexpectedly? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 26.09.2011 21:06, Simon Riggs wrote: > On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao<masao.fujii@gmail.com> wrote: > >> Currently, if a reference to an invalid page is found during recovery, >> its information >> is saved in hash table "invalid_page_tab". Then, if such a reference >> is resolved, >> its information is removed from the hash table. If there is unresolved >> reference to >> an invalid page in the hash table at the end of recovery, PANIC error occurs. >> >> What I'm worried about is that the hash table is volatile. If a user restarts >> the server before reaching end of recovery, any information in the >> hash table is lost, >> and we wrongly miss the PANIC error case because we cannot find any unresolved >> reference. That is, even if database is corrupted at the end of recovery, >> a user might not be able to notice that. This looks like a serious problem. No? >> >> To prevent the above problem, we should write the contents of the hash table to >> the disk for every restartpoints, I think. Then, when the server >> starts recovery, >> it should reload the hash table from the disk. Thought? Am I missing something? > > That doesn't happen because the when we stop the server it will > restart from a valid restartpoint - one where there is no in-progress > multi-phase operation. > > So when it replays it will always replay both parts of the operation. I think you're mixing this up with the multi-page page split operations in b-tree. This is different from that. What the "invalid_page_tab" is for is the situation where you for example, insert to a page on table X, and later table X is dropped, and then you crash. On WAL replay, you will see the insert record, but the file for the table doesn't exist, because the table was dropped. In that case we skip the insert, note what happened in invalid_page_tab, and move on with recovery. When we see the later record to drop the table, we know it was OK that the file was missing earlier. But if we don't see it before end of recovery, we PANIC, because then the file should've been there. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 27.09.2011 00:28, Florian Pflug wrote: > On Sep26, 2011, at 22:39 , Tom Lane wrote: >> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >> we (think we) have reached consistency, rather than leaving it to be >> done only when we exit recovery mode. > > I believe we also need to prevent the creation of restart points before > we've reached consistency. Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency, we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 27, 2011 at 6:54 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I think you're mixing this up with the multi-page page split operations in > b-tree. This is different from that. What the "invalid_page_tab" is for is > the situation where you for example, insert to a page on table X, and later > table X is dropped, and then you crash. On WAL replay, you will see the > insert record, but the file for the table doesn't exist, because the table > was dropped. In that case we skip the insert, note what happened in > invalid_page_tab, and move on with recovery. When we see the later record to > drop the table, we know it was OK that the file was missing earlier. But if > we don't see it before end of recovery, we PANIC, because then the file > should've been there. OK, yes, I see. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote: > On 27.09.2011 00:28, Florian Pflug wrote: >> On Sep26, 2011, at 22:39 , Tom Lane wrote: >>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >>> we (think we) have reached consistency, rather than leaving it to be >>> done only when we exit recovery mode. >> >> I believe we also need to prevent the creation of restart points before >> we've reached consistency. > > Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency,we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. That mimics the way the rm_safe_restartpoint callbacks work, which is good. Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. best regards, Florian Pflug
On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp@phlo.org> wrote: > On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote: >> On 27.09.2011 00:28, Florian Pflug wrote: >>> On Sep26, 2011, at 22:39 , Tom Lane wrote: >>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >>>> we (think we) have reached consistency, rather than leaving it to be >>>> done only when we exit recovery mode. >>> >>> I believe we also need to prevent the creation of restart points before >>> we've reached consistency. >> >> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency,we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. > > That mimics the way the rm_safe_restartpoint callbacks work, which is good. > > Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. Okay, the attached patch prevents the creation of restartpoints by using rm_safe_restartpoint callback if we've not reached a consistent state yet and the invalid-page table is not empty. But the invalid-page table is not tied to the specific resource manager, so using rm_safe_restartpoint for that seems to slightly odd. Is this OK? Also, according to other suggestions, the patch changes XLogCheckInvalidPages() so that it's called as soon as we've reached a consistent state, and changes log_invalid_page() so that it emits PANIC immediately if consistency is already reached. These are very good changes, I think. Because they enable us to notice serious problem which causes PANIC error immediately. Without these changes, you unfortunately might notice that the standby database is corrupted when failover happens. Though such a problem might rarely happen, I think it's worth doing those changes. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote: >>> On 27.09.2011 00:28, Florian Pflug wrote: >>>> On Sep26, 2011, at 22:39 , Tom Lane wrote: >>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >>>>> we (think we) have reached consistency, rather than leaving it to be >>>>> done only when we exit recovery mode. >>>> >>>> I believe we also need to prevent the creation of restart points before >>>> we've reached consistency. >>> >>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency,we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. >> >> That mimics the way the rm_safe_restartpoint callbacks work, which is good. >> >> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. > > Okay, the attached patch prevents the creation of restartpoints by using > rm_safe_restartpoint callback if we've not reached a consistent state yet > and the invalid-page table is not empty. But the invalid-page table is not > tied to the specific resource manager, so using rm_safe_restartpoint for > that seems to slightly odd. Is this OK? > > Also, according to other suggestions, the patch changes XLogCheckInvalidPages() > so that it's called as soon as we've reached a consistent state, and changes > log_invalid_page() so that it emits PANIC immediately if consistency is already > reached. These are very good changes, I think. Because they enable us to > notice serious problem which causes PANIC error immediately. Without these > changes, you unfortunately might notice that the standby database is corrupted > when failover happens. Though such a problem might rarely happen, I think it's > worth doing those changes. Patch does everything we agreed it should. Good suggestion from Florian. This worries me slightly now though because the patch makes us PANIC in a place we didn't used to and once we do that we cannot restart the server at all. Are we sure we want that? It's certainly a great way to shake down errors in other code... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sep29, 2011, at 13:49 , Simon Riggs wrote: > This worries me slightly now though because the patch makes us PANIC > in a place we didn't used to and once we do that we cannot restart the > server at all. Are we sure we want that? It's certainly a great way to > shake down errors in other code... The patch only introduces a new PANIC condition during archive recovery, though. Crash recovery is unaffected, except that we no longer create restart points before we reach consistency. Also, if we hit an invalid page reference after reaching consistency, the cause is probably either a bug in our recovery code, or (quite unlikely) a corrupted WAL that passed the CRC check. In both cases, the likelyhood of data-corruption seems high, so PANICing seems like the right thing to do. best regards, Florian Pflug
On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp@phlo.org> wrote: > On Sep29, 2011, at 13:49 , Simon Riggs wrote: >> This worries me slightly now though because the patch makes us PANIC >> in a place we didn't used to and once we do that we cannot restart the >> server at all. Are we sure we want that? It's certainly a great way to >> shake down errors in other code... > > The patch only introduces a new PANIC condition during archive recovery, > though. Crash recovery is unaffected, except that we no longer create > restart points before we reach consistency. > > Also, if we hit an invalid page reference after reaching consistency, > the cause is probably either a bug in our recovery code, or (quite unlikely) > a corrupted WAL that passed the CRC check. In both cases, the likelyhood > of data-corruption seems high, so PANICing seems like the right thing to do. Fair enough. We might be able to use FATAL or ERROR instead of PANIC because they also cause all processes to exit when the startup process emits them. For example, we now use FATAL to stop the server in recovery mode when recovery is about to end before we've reached a consistent state. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Sep 30, 2011 at 2:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Sep29, 2011, at 13:49 , Simon Riggs wrote: >>> This worries me slightly now though because the patch makes us PANIC >>> in a place we didn't used to and once we do that we cannot restart the >>> server at all. Are we sure we want that? It's certainly a great way to >>> shake down errors in other code... >> >> The patch only introduces a new PANIC condition during archive recovery, >> though. Crash recovery is unaffected, except that we no longer create >> restart points before we reach consistency. >> >> Also, if we hit an invalid page reference after reaching consistency, >> the cause is probably either a bug in our recovery code, or (quite unlikely) >> a corrupted WAL that passed the CRC check. In both cases, the likelyhood >> of data-corruption seems high, so PANICing seems like the right thing to do. > > Fair enough. > > We might be able to use FATAL or ERROR instead of PANIC because they > also cause all processes to exit when the startup process emits them. > For example, we now use FATAL to stop the server in recovery mode > when recovery is about to end before we've reached a consistent state. I think we should issue PANIC if the source is a critical rmgr, or just WARNING if from a non-critical rmgr, such as indexes. Ideally, I think we should have a mechanism to allow indexes to be marked corrupt. For example, a file that if present shows that the index is corrupt and would be marked not valid. We can then create the file and send a sinval message to force the index relcache to be rebuilt showing valid set to false. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 30, 2011 at 3:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I think we should issue PANIC if the source is a critical rmgr, or > just WARNING if from a non-critical rmgr, such as indexes. > > Ideally, I think we should have a mechanism to allow indexes to be > marked corrupt. For example, a file that if present shows that the > index is corrupt and would be marked not valid. We can then create the > file and send a sinval message to force the index relcache to be > rebuilt showing valid set to false. This seems not to be specific to the invalid-page table problem. All error cases from a non-critical rmgr should be treated not-PANIC. So I think that the idea should be implemented separately from the patch I've posted. Anyway what if read-only query accesses the index marked invalid? Just emit ERROR? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 29.09.2011 14:31, Fujii Masao wrote: > On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug<fgp@phlo.org> wrote: >> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. > > Okay, the attached patch prevents the creation of restartpoints by using > rm_safe_restartpoint callback if we've not reached a consistent state yet > and the invalid-page table is not empty. But the invalid-page table is not > tied to the specific resource manager, so using rm_safe_restartpoint for > that seems to slightly odd. Is this OK? I don't think this should use the rm_safe_restartpoint machinery. As you said, it's not tied to any specific resource manager. And I've actually been thinking that we will get rid of rm_safe_restartpoint altogether in the future. The two things that still use it are the b-tree and gin, and I'd like to change both of those to not require any post-recovery cleanup step to finish multi-page operations, similar to what I did with GiST in 9.1. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Oct 3, 2011 at 8:21 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 29.09.2011 14:31, Fujii Masao wrote: >> >> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug<fgp@phlo.org> wrote: >>> >>> Actually, why don't we use that machinery to implement this? There's >>> currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need >>> to create one that checks whether invalid_page_tab is empty. >> >> Okay, the attached patch prevents the creation of restartpoints by using >> rm_safe_restartpoint callback if we've not reached a consistent state yet >> and the invalid-page table is not empty. But the invalid-page table is not >> tied to the specific resource manager, so using rm_safe_restartpoint for >> that seems to slightly odd. Is this OK? > > I don't think this should use the rm_safe_restartpoint machinery. As you > said, it's not tied to any specific resource manager. And I've actually been > thinking that we will get rid of rm_safe_restartpoint altogether in the > future. The two things that still use it are the b-tree and gin, and I'd > like to change both of those to not require any post-recovery cleanup step > to finish multi-page operations, similar to what I did with GiST in 9.1. I thought that was quite neat doing it that way, but there's no specific reason to do it that way I guess. If you're happy to rewrite the patch then I guess we're OK. I certainly would like to get rid of rm_safe_restartpoint in the longer term, hopefully sooner. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 3, 2011 at 6:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > So I think that the idea should be implemented separately from > the patch I've posted. Agreed. I'll do a final review and commit today. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I don't think this should use the rm_safe_restartpoint machinery. As you >> said, it's not tied to any specific resource manager. And I've actually been >> thinking that we will get rid of rm_safe_restartpoint altogether in the >> future. The two things that still use it are the b-tree and gin, and I'd >> like to change both of those to not require any post-recovery cleanup step >> to finish multi-page operations, similar to what I did with GiST in 9.1. > > I thought that was quite neat doing it that way, but there's no > specific reason to do it that way I guess. If you're happy to rewrite > the patch then I guess we're OK. > > I certainly would like to get rid of rm_safe_restartpoint in the > longer term, hopefully sooner. Though Heikki might be already working on that,... anyway, the attached patch is the version which doesn't use rm_safe_restartpoint machinery. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Oct 4, 2011 at 7:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> I don't think this should use the rm_safe_restartpoint machinery. As you >>> said, it's not tied to any specific resource manager. And I've actually been >>> thinking that we will get rid of rm_safe_restartpoint altogether in the >>> future. The two things that still use it are the b-tree and gin, and I'd >>> like to change both of those to not require any post-recovery cleanup step >>> to finish multi-page operations, similar to what I did with GiST in 9.1. >> >> I thought that was quite neat doing it that way, but there's no >> specific reason to do it that way I guess. If you're happy to rewrite >> the patch then I guess we're OK. >> >> I certainly would like to get rid of rm_safe_restartpoint in the >> longer term, hopefully sooner. > > Though Heikki might be already working on that,... anyway, > the attached patch is the version which doesn't use rm_safe_restartpoint > machinery. Heikki - I see you are down on the CF app to review this. I'd been working on it as well, just forgot to let Greg know. Did you start already? Should I stop? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 04.10.2011 09:43, Fujii Masao wrote: > On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs<simon@2ndquadrant.com> wrote: >>> I don't think this should use the rm_safe_restartpoint machinery. As you >>> said, it's not tied to any specific resource manager. And I've actually been >>> thinking that we will get rid of rm_safe_restartpoint altogether in the >>> future. The two things that still use it are the b-tree and gin, and I'd >>> like to change both of those to not require any post-recovery cleanup step >>> to finish multi-page operations, similar to what I did with GiST in 9.1. >> >> I thought that was quite neat doing it that way, but there's no >> specific reason to do it that way I guess. If you're happy to rewrite >> the patch then I guess we're OK. >> >> I certainly would like to get rid of rm_safe_restartpoint in the >> longer term, hopefully sooner. > > Though Heikki might be already working on that,... Just haven't gotten around to it. It's a fair amount of work with little user-visible benefit. > anyway, > the attached patch is the version which doesn't use rm_safe_restartpoint > machinery. Thanks, committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com