Re: Missing pg_control crashes postmaster - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Missing pg_control crashes postmaster
Date
Msg-id 55F8476D-DC2A-4BA9-8A34-D2605F558910@anarazel.de
Whole thread Raw
In response to Re: Missing pg_control crashes postmaster  (David Steele <david@pgmasters.net>)
Responses Re: Missing pg_control crashes postmaster  (David Steele <david@pgmasters.net>)
List pgsql-hackers

On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote:
>On 7/23/18 7:00 PM, Tom Lane wrote:
>> Brian Faherty <anothergenericuser@gmail.com> writes:
> >
>>> There does not really seem to be a need for this behavior as all the
>>> information postgres needs is in memory at this point. I propose
>with
>>> a patch to just recreate pg_control on updates if it does not exist.
>>
>> I would vote to reject any such patch; it's too likely to cause more
>> problems than it solves.  Generally, if critical files like that one
>> have disappeared, trying to write new data isn't going to be enough
>> to fix it and could well result in more corruption.
>>
>> As an example, imagine that you do "rm -rf $PGDATA; initdb" without
>> remembering to shut down the old postmaster first.  Currently, the
>> old postmaster will panic/quit fairly promptly and no harm done.
>> The more aggressive it is at trying to "recover" from the situation,
>> the more likely it is to corrupt the new installation.
>
>It seems much more likely that a missing/modified postmaster.pid will
>cause postgres to panic than it is for a missing pg_control to do so.
>
>Older versions of postgres don't panic until the next checkpoint and
>newer versions won't panic at all on an idle system since we fixed
>redundant checkpoints in 9.6 (6ef2eba3).  An idle postgres 11 cluster
>seems happy enough to run without a pg_control file indefinitely (or at
>
>least 10 minutes, which is past the default checkpoint time).  As soon
>as I write data or perform a checkpoint it does panic, of course.
>
>Conversely, removing/modifying postmaster.pid causes postgres to panic
>very quickly on the versions I tested, 9.4 and 11.
>
>It seems to me that doing the postmaster.pid test at checkpoint time
>(if
>we don't already) would be enough to protect pg_control against
>unintentionally replaced clusters.
>
>Or perhaps writing to an alternate file as David J suggests would do
>the
>trick.
>
>It seems like an easy win if we can find a safe way to do it, though I
>admit that this is only a benefit in corner cases.

What would we win here? Which scenario that's not contrived would be less bad due to the proposed change.  This seems
complexityfor it's own sake. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Missing pg_control crashes postmaster
Next
From: David Steele
Date:
Subject: Re: Missing pg_control crashes postmaster