Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" - Mailing list pgsql-hackers

From Anton A. Melnikov
Subject Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date
Msg-id 2a64f915-213b-4bd0-1cdf-bd987694021e@inbox.ru
Whole thread Raw
In response to Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
List pgsql-hackers
Hi, Thomas!

There are two variants of the patch now.

1) As for the first workaround:

On 31.01.2023 07:09, Thomas Munro wrote:
> 
> Maybe it's unlikely that two samples will match while running that
> torture test, because it's overwriting the file as fast as it can.
> But the idea is that a real system isn't writing the control file most
> of the time.
> 
........
> Yeah, I was thinking that we should also put a limit on the loop, just
> to be cautious.

At first i didn’t understand that the equality condition with the previous
calculated crc and the current one at the second+ attempts was intended
for the case when the pg_control file is really corrupted.

Indeed, by making a few debugging variables and running the tortue test,
i found that there were ~4000 crc errors (~0.0003%) in ~125 million reads from the file,
and there was no case when the crc error appeared twice in a row.
So the second and moreover the third successive occurrence of an crc error
can be neglected and for this workaround seems a simpler and maybe more clear
algorithm is possible.
For instance:

for(try = 0 ; try < 3; try++)
{
    open, read from and close pg_control;
    calculate crc;

    *crc_ok_p = EQ_CRC32C(crc, ControlFile->crc);

    if(*crc_ok_p)
       break;
}

2) As for the second variant of the patch with POSIX locks:

On 31.01.2023 14:38, Thomas Munro wrote:
> On Tue, Jan 31, 2023 at 5:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Clearly there is an element of speculation or superstition here.  I
>> don't know what else to do if both PostgreSQL and ext4 decided not to
>> add interlocking.  Maybe we should rethink that.  How bad would it
>> really be if control file access used POSIX file locking?  I mean, the
>> writer is going to *fsync* the file, so it's not like one more wafer
>> thin system call is going to hurt too much.
> 
> Here's an experimental patch for that alternative.  I wonder if
> someone would want to be able to turn it off for some reason -- maybe
> some NFS problem?  It's less back-patchable, but maybe more
> principled?

It looks very strange to me that there may be cases where the cluster data
is stored in NFS. Can't figure out how this can be useful.

i think this variant of the patch is a normal solution
of the problem, not workaround. Found no problems on Linux.
+1 for this variant.

Might add a custom error message for EDEADLK
since it absent in errcode_for_file_access()?

> I don't know if Windows suffers from this type of problem.
> Unfortunately its equivalent functionality LockFile() looks non-ideal
> for this purpose: if your program crashes, the documentation is very
> vague on when exactly it is released by the OS, but it's not
> immediately on process exit.  That seems non-ideal for a control file
> you might want to access again very soon after a crash, to be able to
> recover.

Unfortunately i've not had time to reproduce the problem and apply this patch on
Windows yet but i'm going to do it soon on windows 10. If a crc error
will occur there, then we might use the workaround from the first
variant of the patch.

> A thought in passing: if UpdateMinRecoveryPoint() performance is an
> issue, maybe we should figure out how to use fdatasync() instead of
> fsync().

May be choose it in accordance with GUC wal_sync_method?


Sincerely yours,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Generating "Subplan Removed" in EXPLAIN
Next
From: Justin Pryzby
Date:
Subject: Re: Progress report of CREATE INDEX for nested partitioned tables