Thread: PATCH: standby crashed when replay block which truncated in standbybut failed to truncate in master node

The step to reproduce this issue.
1. Create a table
    create table gist_point_tbl(id int4, p point);
    create index gist_pointidx on gist_point_tbl using gist(p);
2. Insert data
    insert into gist_point_tbl (id, p) select g,        point(g*10, g*10) from generate_series(1, 1000000) g;
3. Delete data
     delete from gist_point_bl;
4. Vacuum table
    vacuum gist_point_tbl;
    -- Send SIGINT to vacuum process after WAL-log of the truncation is flushed and the truncation is not finished
    -- We will receive error message "ERROR:  canceling statement due to user request"
5. Vacuum table again
    vacuum gist_point tbl;
    -- The standby node crashed and the PANIC log is "PANIC:  WAL contains references to invalid pages"

The standby node succeed to replay truncate log but master node doesn't truncate the file, it will be crashed if master node writes to these blocks which truncated in standby node.
I try to fix issue to prevent query cancel interrupts during truncating.

diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 5df4382b7e..04b696ae01 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -26,6 +26,7 @@
 #include "access/xlogutils.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "miscadmin.h"
 #include "storage/freespace.h"
 #include "storage/smgr.h"
 #include "utils/memutils.h"
@@ -248,6 +249,14 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
        if (vm)
                visibilitymap_truncate(rel, nblocks);

+       /*
+        * When master node flush WAL-log of the truncation and then receive SIGINT signal to cancel
+        * this transaction before the truncation, if standby receive this WAL-log and do the truncation,
+        * standby node will crash when master node writes to these blocks which are truncated in standby node.
+        * So we prevent query cancel interrupts.
+        */
+       HOLD_CANCEL_INTERRUPTS();
+
        /*
         * We WAL-log the truncation before actually truncating, which means
         * trouble if the truncation fails. If we then crash, the WAL replay
@@ -288,6 +297,8 @@ RelationTruncate(Relation rel, BlockNumber nblocks)

        /* Do the real work */
        smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+
+       RESUME_CANCEL_INTERRUPTS();
 }


 

Is this an issue? 
Can we fix like this?
Thanks!





At 2019-09-22 00:38:03, "Thunder" <thunder1@126.com> wrote:
The step to reproduce this issue.
1. Create a table
    create table gist_point_tbl(id int4, p point);
    create index gist_pointidx on gist_point_tbl using gist(p);
2. Insert data
    insert into gist_point_tbl (id, p) select g,        point(g*10, g*10) from generate_series(1, 1000000) g;
3. Delete data
     delete from gist_point_bl;
4. Vacuum table
    vacuum gist_point_tbl;
    -- Send SIGINT to vacuum process after WAL-log of the truncation is flushed and the truncation is not finished
    -- We will receive error message "ERROR:  canceling statement due to user request"
5. Vacuum table again
    vacuum gist_point tbl;
    -- The standby node crashed and the PANIC log is "PANIC:  WAL contains references to invalid pages"

The standby node succeed to replay truncate log but master node doesn't truncate the file, it will be crashed if master node writes to these blocks which truncated in standby node.
I try to fix issue to prevent query cancel interrupts during truncating.

diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 5df4382b7e..04b696ae01 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -26,6 +26,7 @@
 #include "access/xlogutils.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "miscadmin.h"
 #include "storage/freespace.h"
 #include "storage/smgr.h"
 #include "utils/memutils.h"
@@ -248,6 +249,14 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
        if (vm)
                visibilitymap_truncate(rel, nblocks);

+       /*
+        * When master node flush WAL-log of the truncation and then receive SIGINT signal to cancel
+        * this transaction before the truncation, if standby receive this WAL-log and do the truncation,
+        * standby node will crash when master node writes to these blocks which are truncated in standby node.
+        * So we prevent query cancel interrupts.
+        */
+       HOLD_CANCEL_INTERRUPTS();
+
        /*
         * We WAL-log the truncation before actually truncating, which means
         * trouble if the truncation fails. If we then crash, the WAL replay
@@ -288,6 +297,8 @@ RelationTruncate(Relation rel, BlockNumber nblocks)

        /* Do the real work */
        smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+
+       RESUME_CANCEL_INTERRUPTS();
 }


 



 

On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
>Is this an issue?
>Can we fix like this?
>Thanks!
>

I do think it is a valid issue. No opinion on the fix yet, though.
The report was sent on saturday, so patience ;-)

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
>> Is this an issue?
>> Can we fix like this?
>> Thanks!
>>
>
> I do think it is a valid issue. No opinion on the fix yet, though.
> The report was sent on saturday, so patience ;-)

And for some others it was even a longer weekend.  Anyway, the problem
can be reproduced if you apply the attached which introduces a failure
point, and then if you run the following commands:
create table aa as select 1;
delete from aa;
\! touch /tmp/truncate_flag
vacuum aa;
\! rm /tmp/truncate_flag
vacuum aa; -- panic on standby

This also points out that there are other things to worry about than
interruptions, as for example DropRelFileNodeLocalBuffers() could lead
to an ERROR, and this happens before the physical truncation is done
but after the WAL record is replayed on the standby, so any failures
happening at the truncation phase before the work is done would be a
problem.  However we are talking about failures which should not
happen and these are elog() calls.  It would be tempting to add a
critical section here, but we could still have problems if we have a
failure after the WAL record has been flushed, which means that it
would be replayed on the standby, and the surrounding comments are
clear about that.  In short, as a matter of safety I'd like to think
that what you are suggesting is rather acceptable (aka hold interrupts
before the WAL record is written and release after the physical
truncate), so as truncation avoids failures possible to avoid.

Do others have thoughts to share on the matter?
--
Michael

Attachment
Hello.

At Tue, 24 Sep 2019 10:40:19 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190924014019.GB2012@paquier.xyz>
> On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
> >> Is this an issue?
> >> Can we fix like this?
> >> Thanks!
> >> 
> > 
> > I do think it is a valid issue. No opinion on the fix yet, though.
> > The report was sent on saturday, so patience ;-)
> 
> And for some others it was even a longer weekend.  Anyway, the problem
> can be reproduced if you apply the attached which introduces a failure
> point, and then if you run the following commands:
> create table aa as select 1;
> delete from aa;
> \! touch /tmp/truncate_flag
> vacuum aa;
> \! rm /tmp/truncate_flag
> vacuum aa; -- panic on standby
> 
> This also points out that there are other things to worry about than
> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> to an ERROR, and this happens before the physical truncation is done
> but after the WAL record is replayed on the standby, so any failures
> happening at the truncation phase before the work is done would be a

Indeed.

> problem.  However we are talking about failures which should not
> happen and these are elog() calls.  It would be tempting to add a
> critical section here, but we could still have problems if we have a
> failure after the WAL record has been flushed, which means that it
> would be replayed on the standby, and the surrounding comments are

Agreed.

> clear about that.  In short, as a matter of safety I'd like to think
> that what you are suggesting is rather acceptable (aka hold interrupts
> before the WAL record is written and release after the physical
> truncate), so as truncation avoids failures possible to avoid.
> 
> Do others have thoughts to share on the matter?

Agreed for the concept, but does the patch work as described? It
seems that query cancel doesn't fire during the holded-off
section since no CHECK_FOR_INTERRUPTS() there.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Tue, 24 Sep 2019 12:46:19 +0900 (Tokyo Standard Time), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
<20190924.124619.248088532.horikyota.ntt@gmail.com>
> > clear about that.  In short, as a matter of safety I'd like to think
> > that what you are suggesting is rather acceptable (aka hold interrupts
> > before the WAL record is written and release after the physical
> > truncate), so as truncation avoids failures possible to avoid.
> > 
> > Do others have thoughts to share on the matter?
> 
> Agreed for the concept, but does the patch work as described? It
> seems that query cancel doesn't fire during the holded-off
> section since no CHECK_FOR_INTERRUPTS() there.

Of course I found no *explicit* ones. But I found one
ereport(DEBUG1 in register_dirty_segment. So it will work at
least for the case where fsync request queue is full.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Sep 24, 2019 at 02:48:16PM +0900, Kyotaro Horiguchi wrote:
> Of course I found no *explicit* ones. But I found one
> ereport(DEBUG1 in register_dirty_segment. So it will work at
> least for the case where fsync request queue is full.

Exactly.  I have not checked the patch in details, but I think that
we should not rely on the assumption that no code paths in this area do
not check after CHECK_FOR_INTERRUPTS() as smgrtruncate() does much
more than just the physical segment truncation.
--
Michael

Attachment
At Wed, 25 Sep 2019 12:24:03 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190925032403.GF1815@paquier.xyz>
> On Tue, Sep 24, 2019 at 02:48:16PM +0900, Kyotaro Horiguchi wrote:
> > Of course I found no *explicit* ones. But I found one
> > ereport(DEBUG1 in register_dirty_segment. So it will work at
> > least for the case where fsync request queue is full.
> 
> Exactly.  I have not checked the patch in details, but I think that
> we should not rely on the assumption that no code paths in this area do
> not check after CHECK_FOR_INTERRUPTS() as smgrtruncate() does much
> more than just the physical segment truncation.

Agreed to the point. Just I doubted that it really fixes the
author's problem. And confirmed that it can be.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
> >> Is this an issue?
> >> Can we fix like this?
> >> Thanks!
> >>
> >
> > I do think it is a valid issue. No opinion on the fix yet, though.
> > The report was sent on saturday, so patience ;-)
>
> And for some others it was even a longer weekend.  Anyway, the problem
> can be reproduced if you apply the attached which introduces a failure
> point, and then if you run the following commands:
> create table aa as select 1;
> delete from aa;
> \! touch /tmp/truncate_flag
> vacuum aa;
> \! rm /tmp/truncate_flag
> vacuum aa; -- panic on standby
>
> This also points out that there are other things to worry about than
> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> to an ERROR, and this happens before the physical truncation is done
> but after the WAL record is replayed on the standby, so any failures
> happening at the truncation phase before the work is done would be a
> problem.  However we are talking about failures which should not
> happen and these are elog() calls.  It would be tempting to add a
> critical section here, but we could still have problems if we have a
> failure after the WAL record has been flushed, which means that it
> would be replayed on the standby, and the surrounding comments are
> clear about that.

Could you elaborate what problem adding a critical section there occurs?

Regards,

-- 
Fujii Masao



On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael@paquier.xyz> wrote:
>> This also points out that there are other things to worry about than
>> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
>> to an ERROR, and this happens before the physical truncation is done
>> but after the WAL record is replayed on the standby, so any failures
>> happening at the truncation phase before the work is done would be a
>> problem.  However we are talking about failures which should not
>> happen and these are elog() calls.  It would be tempting to add a
>> critical section here, but we could still have problems if we have a
>> failure after the WAL record has been flushed, which means that it
>> would be replayed on the standby, and the surrounding comments are
>> clear about that.
>
> Could you elaborate what problem adding a critical section there occurs?

Wrapping the call of smgrtruncate() within RelationTruncate() to use a
critical section would make things worse from the user perspective on
the primary, no?  If the physical truncation fails, we would still
fail WAL replay on the standby, but instead of generating an ERROR in
the session of the user attempting the TRUNCATE, the whole primary
would be taken down.
--
Michael

Attachment
On Fri, Sep 27, 2019 at 3:14 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> > On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael@paquier.xyz> wrote:
> >> This also points out that there are other things to worry about than
> >> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> >> to an ERROR, and this happens before the physical truncation is done
> >> but after the WAL record is replayed on the standby, so any failures
> >> happening at the truncation phase before the work is done would be a
> >> problem.  However we are talking about failures which should not
> >> happen and these are elog() calls.  It would be tempting to add a
> >> critical section here, but we could still have problems if we have a
> >> failure after the WAL record has been flushed, which means that it
> >> would be replayed on the standby, and the surrounding comments are
> >> clear about that.
> >
> > Could you elaborate what problem adding a critical section there occurs?
>
> Wrapping the call of smgrtruncate() within RelationTruncate() to use a
> critical section would make things worse from the user perspective on
> the primary, no?  If the physical truncation fails, we would still
> fail WAL replay on the standby, but instead of generating an ERROR in
> the session of the user attempting the TRUNCATE, the whole primary
> would be taken down.

Thanks for elaborating that! Understood.

But this can cause subsequent recovery to always fail with invalid-pages error
and the server not to start up. This is bad. So, to allviate the situation,
I'm thinking it would be worth adding something like igore_invalid_pages
developer parameter. When this parameter is set to true, the startup process
always ignores invalid-pages errors. Thought?

Regards,

-- 
Fujii Masao



On Thu, Oct 03, 2019 at 01:49:34PM +0900, Fujii Masao wrote:
> But this can cause subsequent recovery to always fail with invalid-pages error
> and the server not to start up. This is bad. So, to allviate the situation,
> I'm thinking it would be worth adding something like igore_invalid_pages
> developer parameter. When this parameter is set to true, the startup process
> always ignores invalid-pages errors. Thought?

That could be helpful.
--
Michael

Attachment
On Thu, Oct 3, 2019 at 1:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 03, 2019 at 01:49:34PM +0900, Fujii Masao wrote:
> > But this can cause subsequent recovery to always fail with invalid-pages error
> > and the server not to start up. This is bad. So, to allviate the situation,
> > I'm thinking it would be worth adding something like igore_invalid_pages
> > developer parameter. When this parameter is set to true, the startup process
> > always ignores invalid-pages errors. Thought?
>
> That could be helpful.

So attached patch adds new developer GUC "ignore_invalid_pages".
Setting ignore_invalid_pages to true causes the system
to ignore the failure (but still report a warning), and continue recovery.

I will add this to next CommitFest.

Regards,

-- 
Fujii Masao

Attachment
On Thu, Oct 03, 2019 at 05:54:40PM +0900, Fujii Masao wrote:
> On Thu, Oct 3, 2019 at 1:57 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Oct 03, 2019 at 01:49:34PM +0900, Fujii Masao wrote:
> > > But this can cause subsequent recovery to always fail with invalid-pages error
> > > and the server not to start up. This is bad. So, to allviate the situation,
> > > I'm thinking it would be worth adding something like igore_invalid_pages
> > > developer parameter. When this parameter is set to true, the startup process
> > > always ignores invalid-pages errors. Thought?
> >
> > That could be helpful.
>
> So attached patch adds new developer GUC "ignore_invalid_pages".
> Setting ignore_invalid_pages to true causes the system
> to ignore the failure (but still report a warning), and continue recovery.
>
> I will add this to next CommitFest.

No actual objections against this patch from me as a dev option.

+        Detection of WAL records having references to invalid pages during
+        recovery causes <productname>PostgreSQL</productname> to report
+        an error, aborting the recovery. Setting
Well, that's not really an error.  This triggers a PANIC, aka crashes
the server.  And in this case the actual problem is that you may not
be able to move on with recovery when restarting the server again,
except if luck is on your side because you would continuously face
it..

+        recovery. This behavior may <emphasis>cause crashes, data loss,
+         propagate or hide corruption, or other serious problems</emphasis>.
Nit: indentation on the second line here.

+        However, it may allow you to get past the error, finish the recovery,
+        and cause the server to start up.
For consistency here I would suggest the second part of the sentence
to be "TO finish recovery, and TO cause the server to start up".

+        The default setting is off, and it can only be set at server start.
Nit^2: Missing a <literal> markup for "off"?
--
Michael

Attachment
On Fri, Nov 29, 2019 at 11:39 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 03, 2019 at 05:54:40PM +0900, Fujii Masao wrote:
> > On Thu, Oct 3, 2019 at 1:57 PM Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Thu, Oct 03, 2019 at 01:49:34PM +0900, Fujii Masao wrote:
> > > > But this can cause subsequent recovery to always fail with invalid-pages error
> > > > and the server not to start up. This is bad. So, to allviate the situation,
> > > > I'm thinking it would be worth adding something like igore_invalid_pages
> > > > developer parameter. When this parameter is set to true, the startup process
> > > > always ignores invalid-pages errors. Thought?
> > >
> > > That could be helpful.
> >
> > So attached patch adds new developer GUC "ignore_invalid_pages".
> > Setting ignore_invalid_pages to true causes the system
> > to ignore the failure (but still report a warning), and continue recovery.
> >
> > I will add this to next CommitFest.
>
> No actual objections against this patch from me as a dev option.

Thanks for the review! Attached is the updated version of the patch.

> +        Detection of WAL records having references to invalid pages during
> +        recovery causes <productname>PostgreSQL</productname> to report
> +        an error, aborting the recovery. Setting
> Well, that's not really an error.  This triggers a PANIC, aka crashes
> the server.  And in this case the actual problem is that you may not
> be able to move on with recovery when restarting the server again,
> except if luck is on your side because you would continuously face
> it..

So you're thinking that "report an error" should be changed to
"trigger a PANIC"? Personally "report an error" sounds ok because
PANIC is one of "error", I think. But if that misleads people,
I will change the sentence.

> +        recovery. This behavior may <emphasis>cause crashes, data loss,
> +         propagate or hide corruption, or other serious problems</emphasis>.
> Nit: indentation on the second line here.

Yes, I fixed that.

> +        However, it may allow you to get past the error, finish the recovery,
> +        and cause the server to start up.
> For consistency here I would suggest the second part of the sentence
> to be "TO finish recovery, and TO cause the server to start up".

Yes, I fixed that.

> +        The default setting is off, and it can only be set at server start.
> Nit^2: Missing a <literal> markup for "off"?

Yes, I fixed that.

Regards,

-- 
Fujii Masao

Attachment
On Mon, Dec 16, 2019 at 12:22:18PM +0900, Fujii Masao wrote:
> > +        Detection of WAL records having references to invalid pages during
> > +        recovery causes <productname>PostgreSQL</productname> to report
> > +        an error, aborting the recovery. Setting
> > Well, that's not really an error.  This triggers a PANIC, aka crashes
> > the server.  And in this case the actual problem is that you may not
> > be able to move on with recovery when restarting the server again,
> > except if luck is on your side because you would continuously face
> > it..
>
> So you're thinking that "report an error" should be changed to
> "trigger a PANIC"? Personally "report an error" sounds ok because
> PANIC is one of "error", I think. But if that misleads people,
> I will change the sentence.

In the context of a recovery, an ERROR is promoted to a FATAL, but
here are talking about something that bypasses the crash of the
server.  So this could bring confusion.  I think that the
documentation should be crystal clear about that, with two aspects
outlined when the parameter is disabled, somewhat like data_sync_retry
actually:
- A PANIC-level error is triggered.
- It crashes the cluster.
--
Michael

Attachment
On Tue, Dec 17, 2019 at 2:19 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Dec 16, 2019 at 12:22:18PM +0900, Fujii Masao wrote:
> > > +        Detection of WAL records having references to invalid pages during
> > > +        recovery causes <productname>PostgreSQL</productname> to report
> > > +        an error, aborting the recovery. Setting
> > > Well, that's not really an error.  This triggers a PANIC, aka crashes
> > > the server.  And in this case the actual problem is that you may not
> > > be able to move on with recovery when restarting the server again,
> > > except if luck is on your side because you would continuously face
> > > it..
> >
> > So you're thinking that "report an error" should be changed to
> > "trigger a PANIC"? Personally "report an error" sounds ok because
> > PANIC is one of "error", I think. But if that misleads people,
> > I will change the sentence.
>
> In the context of a recovery, an ERROR is promoted to a FATAL, but
> here are talking about something that bypasses the crash of the
> server.  So this could bring confusion.  I think that the
> documentation should be crystal clear about that, with two aspects
> outlined when the parameter is disabled, somewhat like data_sync_retry
> actually:
> - A PANIC-level error is triggered.
> - It crashes the cluster.

OK, I updated the patch that way.
Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao

Attachment
On Thu, Jan 16, 2020 at 11:17:36PM +0900, Fujii Masao wrote:
> OK, I updated the patch that way.
> Attached is the updated version of the patch.

Thanks.  I have few tweaks to propose to the docs.

+        raise a PANIC-level error, aborting the recovery. Setting
Instead of "PANIC-level error", I would just use "PANIC error", and
instead of "aborting the recovery" just "crashing the server".

+        causes the system to ignore those WAL records
WAL records are not ignored, but errors caused by incorrect page
references in those WAL records are.  The current phrasing sounds like
the WAL records are not applied.

Another thing that I just recalled.  Do you think that it would be
better to mention that invalid page references can only be seen after
reaching the consistent point during recovery?  The information given
looks enough, but I was just wondering if that's worth documenting or
not.
--
Michael

Attachment
On Fri, Jan 17, 2020 at 1:47 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jan 16, 2020 at 11:17:36PM +0900, Fujii Masao wrote:
> > OK, I updated the patch that way.
> > Attached is the updated version of the patch.
>
> Thanks.  I have few tweaks to propose to the docs.
>
> +        raise a PANIC-level error, aborting the recovery. Setting
> Instead of "PANIC-level error", I would just use "PANIC error", and

I have no strong opinion about this, but I used "PANIC-level error"
because the description for data_sync_retry has already used it.

> instead of "aborting the recovery" just "crashing the server".

PANIC implies server crash, so IMO "crashing the server" is
a bit redundant, and "aborting the recovery" is better because
"continue the recovery" is used later.

> +        causes the system to ignore those WAL records
> WAL records are not ignored, but errors caused by incorrect page
> references in those WAL records are.  The current phrasing sounds like
> the WAL records are not applied.

So, what about

---------------
causes the system to ignore invalid page references in WAL records
(but still report a warning), and continue the recovery.
---------------

> Another thing that I just recalled.  Do you think that it would be
> better to mention that invalid page references can only be seen after
> reaching the consistent point during recovery?  The information given
> looks enough, but I was just wondering if that's worth documenting or
> not.

ISTM that this is not the information that users should understand...

Regards,

-- 
Fujii Masao



On Fri, Jan 17, 2020 at 07:36:51PM +0900, Fujii Masao wrote:
> On Fri, Jan 17, 2020 at 1:47 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Thanks.  I have few tweaks to propose to the docs.
>>
>> +        raise a PANIC-level error, aborting the recovery. Setting
>> Instead of "PANIC-level error", I would just use "PANIC error", and
>
> I have no strong opinion about this, but I used "PANIC-level error"
> because the description for data_sync_retry has already used it.

Okay.  Fine with what you think is good.

>> instead of "aborting the recovery" just "crashing the server".
>
> PANIC implies server crash, so IMO "crashing the server" is
> a bit redundant, and "aborting the recovery" is better because
> "continue the recovery" is used later.

Okay.  I see your point here.

> So, what about
>
> ---------------
> causes the system to ignore invalid page references in WAL records
> (but still report a warning), and continue the recovery.
> ---------------

And that sounds good to me.  Switching the patch as ready for
committer.
--
Michael

Attachment
On Sat, Jan 18, 2020 at 9:18 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 17, 2020 at 07:36:51PM +0900, Fujii Masao wrote:
> > So, what about
> >
> > ---------------
> > causes the system to ignore invalid page references in WAL records
> > (but still report a warning), and continue the recovery.
> > ---------------
>
> And that sounds good to me.  Switching the patch as ready for
> committer.
>

Are we planning to do something about the original problem reported in
this thread?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jan 20, 2020 at 02:13:53PM +0530, Amit Kapila wrote:
> Are we planning to do something about the original problem reported in
> this thread?

We should.  This is on my TODO list, though seeing that it involved
full_page_writes=off I drifted a bit away from it.
--
Michael

Attachment
On Tue, Jan 21, 2020 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jan 20, 2020 at 02:13:53PM +0530, Amit Kapila wrote:
> > Are we planning to do something about the original problem reported in
> > this thread?
>
> We should.  This is on my TODO list, though seeing that it involved
> full_page_writes=off I drifted a bit away from it.
>

The original email doesn't say so.  I might be missing something, but
can you explain what makes you think so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:
> The original email doesn't say so.  I might be missing something, but
> can you explain what makes you think so.

Oops.  Incorrect thread, I was thinking about this one previously:
https://www.postgresql.org/message-id/822113470.250068.1573246011818@connect.xfinity.com

Re-reading the root of the thread, I am still not sure what we could
do, as that's rather tricky.  See here:
https://www.postgresql.org/message-id/20190927061414.GF8485@paquier.xyz
--
Michael

Attachment

On 2020/01/21 13:39, Michael Paquier wrote:
> On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:
>> The original email doesn't say so.  I might be missing something, but
>> can you explain what makes you think so.
> 
> Oops.  Incorrect thread, I was thinking about this one previously:
> https://www.postgresql.org/message-id/822113470.250068.1573246011818@connect.xfinity.com
> 
> Re-reading the root of the thread, I am still not sure what we could
> do, as that's rather tricky.  See here:
> https://www.postgresql.org/message-id/20190927061414.GF8485@paquier.xyz

The original proposal, i.e., holding the interrupts during
the truncation, is worth considering? It is not a perfect
solution but might improve the situation a bit.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




On 2020/01/18 12:48, Michael Paquier wrote:
> On Fri, Jan 17, 2020 at 07:36:51PM +0900, Fujii Masao wrote:
>> On Fri, Jan 17, 2020 at 1:47 PM Michael Paquier <michael@paquier.xyz> wrote:
>>> Thanks.  I have few tweaks to propose to the docs.
>>>
>>> +        raise a PANIC-level error, aborting the recovery. Setting
>>> Instead of "PANIC-level error", I would just use "PANIC error", and
>>
>> I have no strong opinion about this, but I used "PANIC-level error"
>> because the description for data_sync_retry has already used it.
> 
> Okay.  Fine with what you think is good.
> 
>>> instead of "aborting the recovery" just "crashing the server".
>>
>> PANIC implies server crash, so IMO "crashing the server" is
>> a bit redundant, and "aborting the recovery" is better because
>> "continue the recovery" is used later.
> 
> Okay.  I see your point here.
> 
>> So, what about
>>
>> ---------------
>> causes the system to ignore invalid page references in WAL records
>> (but still report a warning), and continue the recovery.
>> ---------------
> 
> And that sounds good to me.  Switching the patch as ready for
> committer.

Thanks! Committed!

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Hi,

On 2020-01-21 15:41:54 +0900, Fujii Masao wrote:
> On 2020/01/21 13:39, Michael Paquier wrote:
> > On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:
> > > The original email doesn't say so.  I might be missing something, but
> > > can you explain what makes you think so.
> >
> > Oops.  Incorrect thread, I was thinking about this one previously:
> > https://www.postgresql.org/message-id/822113470.250068.1573246011818@connect.xfinity.com
> >
> > Re-reading the root of the thread, I am still not sure what we could
> > do, as that's rather tricky.

Did anybody consider the proposal at
https://www.postgresql.org/message-id/20191223005430.yhf4n3zr4ojwbcn2%40alap3.anarazel.de ?
I think we're going to have to do something like that to actually fix
the problem, rather than polish around the edges.


> See here:
> https://www.postgresql.org/message-id/20190927061414.GF8485@paquier.xyz

On 2019-09-27 15:14:14 +0900, Michael Paquier wrote:
> Wrapping the call of smgrtruncate() within RelationTruncate() to use a
> critical section would make things worse from the user perspective on
> the primary, no?  If the physical truncation fails, we would still
> fail WAL replay on the standby, but instead of generating an ERROR in
> the session of the user attempting the TRUNCATE, the whole primary
> would be taken down.

FWIW, to me this argument just doesn't make any sense - even if a few
people have argued it.

A failure in the FS truncate currently yields to a cluster in a
corrupted state in multiple ways:
1) Dirty buffer contents were thrown away, and going forward their old
   contents will be read back.
2) We have WAL logged something that we haven't done. That's *obviously*
   something *completely* violating WAL logging rules. And break WAL
   replay (including locally, should we crash before the next
   checkpoint - there could be subsequent WAL records relying on the
   block's existance).

That's so obviously worse than a PANIC restart, that I really don't
understand the "worse from the user perspective" argument from your
email above.  Obviously it sucks that the error might re-occur during
recovery. But that's something that usually actually can be fixed -
whereas the data corruption can't.


> The original proposal, i.e., holding the interrupts during
> the truncation, is worth considering? It is not a perfect
> solution but might improve the situation a bit.

I don't think it's useful in isolation.

Greetings,

Andres Freund