Re: Stronger safeguard for archive recovery not to miss data - Mailing list pgsql-hackers

From David Steele
Subject Re: Stronger safeguard for archive recovery not to miss data
Date
Msg-id b710df78-d7b8-2a4e-80f1-cb8ddf31e092@pgmasters.net
Whole thread Raw
In response to Re: Stronger safeguard for archive recovery not to miss data  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Stronger safeguard for archive recovery not to miss data  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 3/25/21 9:23 PM, Fujii Masao wrote:
> 
> On 2021/03/25 23:21, David Steele wrote:
>> On 1/25/21 3:55 AM, Laurenz Albe wrote:
>>> On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote:
>>>>> I think you should pst another patch where the second, now 
>>>>> superfluous, error
>>>>> message is removed.
>>>>
>>>> Updated. This patch showed no failure during regression tests
>>>> and has been aligned by pgindent.
>>>
>>> Looks good to me.
>>> I'll set it to "ready for committer" again.
>>
>> Fujii, does the new patch in [1] address your concerns?
> 
> No. I'm still not sure if this patch is good idea... I understand
> why this safeguard is necessary. OTOH I'm afraid it increases
> a bit the risk that users get unstartable database, i.e., lose whole 
> database.
> But maybe I'm concerned about rare case and my opinion is minority one.
> So I'd like to hear more opinions about this patch.

After reviewing the patch I am +1. I think allowing corruption in 
recovery by default is not a good idea. There is currently a warning but 
I don't think that is nearly strong enough and is easily missed.

Also, "data may be missing" makes this sound like an acceptable 
situation. What is really happening is corruption is being introduced 
that may make the cluster unusable or at the very least lead to errors 
during normal operation.

If we want to allow recovery to play past this point I think it would 
make more sense to have a GUC (like ignore_invalid_pages [1]) that 
allows recovery to proceed and emits a warning instead of fatal.

Looking the patch, I see a few things:

1) Typo in the tests:

This protection shold apply to recovery mode

should be:

This protection should apply to recovery mode

2) I don't think it should be the job of this patch to restructure the 
if conditions, even if it is more efficient. It just obscures the 
purpose of the patch. So, I would revert all the changes in xlog.c 
except changing the warning to an error:

-        ereport(WARNING,
-                (errmsg("WAL was generated with wal_level=minimal, data may be 
missing"),
-                 errhint("This happens if you temporarily set wal_level=minimal 
without taking a new base backup.")));
+            ereport(FATAL,
+                    (errmsg("WAL was generated with wal_level=minimal, cannot continue 
recovering"),
+                     errdetail("This happens if you temporarily set wal_level=minimal 
on the server."),
+                     errhint("Run recovery again from a new base backup taken after 
setting wal_level higher than minimal")));

-- 
-David
david@pgmasters.net

[1] 
https://www.postgresql.org/message-id/E1iu6D8-0000tK-Cm%40gemulon.postgresql.org



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] pg_permissions