Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" - Mailing list pgsql-hackers

From Anton A. Melnikov
Subject Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date
Msg-id d1a8f3a8-3ef9-1cd7-f4e9-ab1712c14745@inbox.ru
Whole thread Raw
In response to Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
List pgsql-hackers
Hi, Thomas!

Thanks for your rapid answer and sorry for my delay with reply.

On 01.02.2023 09:45, Thomas Munro wrote:
>> Might add a custom error message for EDEADLK
>> since it absent in errcode_for_file_access()?
> 
> Ah, good thought.  I think it shouldn't happen™, so it's OK that
> errcode_for_file_access() would classify it as ERRCODE_INTERNAL_ERROR.

Yes, i also think that is impossible since the lock is taken on
the entire file, so ERRCODE_INTERNAL_ERROR will be right here.

> Other interesting errors are:
> 
> ENOLCK: system limits exceeded; PANIC seems reasonable
> EOPNOTSUPP: this file doesn't support locking (seen on FreeBSD man
> pages, not on POSIX)

Agreed that ENOLCK is a PANIC or at least FATAL. Maybe it's even better
to do it FATAL to allow other backends to survive?
As for EOPNOTSUPP, maybe make a fallback to the workaround from the
first variant of the patch? (In my previous letter i forgot the pause
after break;, of cause)

>>> I don't know if Windows suffers from this type of problem.
>>> Unfortunately its equivalent functionality LockFile() looks non-ideal
>>> for this purpose: if your program crashes, the documentation is very
>>> vague on when exactly it is released by the OS, but it's not
>>> immediately on process exit.  That seems non-ideal for a control file
>>> you might want to access again very soon after a crash, to be able to
>>> recover.
>>
>> Unfortunately i've not had time to reproduce the problem and apply this patch on
>> Windows yet but i'm going to do it soon on windows 10. If a crc error
>> will occur there, then we might use the workaround from the first
>> variant of the patch.
> 
> Thank you for investigating.  I am afraid to read your results.

First of all it seemed to me that is not a problem at all since msdn
guarantees sector-by-sector atomicity.
"Physical Sector: The unit for which read and write operations to the device
are completed in a single operation. This is the unit of atomic write..."
https://learn.microsoft.com/en-us/windows/win32/fileio/file-buffering
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
(Of course, only if the 512 bytes lays from the beginning of the file with a zero
offset, but this is our case. The current size of ControlFileData is
296 bytes at offset = 0.)

I tried to verify this fact experimentally and was very surprised.
Unfortunately it crashed in an hour during torture test:
2023-02-13 15:10:52.675 MSK [5704] LOG:  starting PostgreSQL 16devel, compiled by Visual C++ build 1929, 64-bit
2023-02-13 15:10:52.768 MSK [5704] LOG:  database system is ready to accept connections
@@@@@@ sizeof(ControlFileData) = 296
.........
2023-02-13 16:10:41.997 MSK [9380] ERROR:  calculated CRC checksum does not match value stored in file

But fortunately, this only happens when fsync=off.
Also i did several experiments with fsync=on and found more appropriate behavior:
The stress test with sizeof(ControlFileData) = 512+8 = 520 bytes failed in a 4,5 hours,
but the other one with ordinary sizeof(ControlFileData) = 296 not crashed in more than 12 hours.

Seems in that case the behavior corresponds to msdn.  So if it is possible
to use fsync() under windows when the GUC fsync is off it maybe a solution
for this problem. If so there is no need to lock the pg_control file under windows at all.

>> May be choose it in accordance with GUC wal_sync_method?
> 
> Here's a patch like that.  I don't know if it's a good idea for
> wal_sync_method to affect other kinds of files or not, but, then, it
> already does (fsync_writethough changes this behaviour).   

+1. Looks like it needs a little fix:

+++ b/src/common/controldata_utils.c
@@ -316,7 +316,7 @@ update_controlfile(const char *DataDir,
                         if (pg_fsync(fd) != 0)
                                 ereport(PANIC,
                                                 (errcode_for_file_access(),
-                                                errmsg("could not fdatasync file \"%s\": %m",
+                                                errmsg("could not fsync file \"%s\": %m",
                                                                 ControlFilePath)));

And it may be combined with 0001-Lock-pg_control-while-reading-or-writing.patch

> I would
> only want to consider this if we also stop choosing "open_datasync" as
> a default on at least Windows.

I didn't quite understand this point. Could you clarify it for me, please? If the performance
of UpdateMinRecoveryPoint() wasn't a problem we could just use fsync in all platforms.
   

Sincerely yours,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Next
From: Nathan Bossart
Date:
Subject: Re: Improve logging when using Huge Pages