Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (onWindows) - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (onWindows)
Date
Msg-id e5179494-715e-f8a3-266b-0cf52adac8f4@gmail.com
Whole thread Raw
In response to BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
06.12.2019 10:00, PG Bug reporting form wrote:
When performing regression tests on Windows intermittent failures are
observed, e.g. in src/bin/pg_basebackup test:
vcregress taptest src/bin/pg_basebackup
...
t/010_pg_basebackup.pl ... 10/106 Bailout called.  Further testing stopped: 
system pg_ctl failed
FAILED--Further testing stopped: system pg_ctl failed


waiting for server to start....The process cannot access the file because it
is being used by another process.stopped waiting
pg_ctl: could not start server

The issue is caused by sporadic "pg_ctl ... restart -l logfile" failures.

To reproduce this issue reliably I propose the simple demo patch (delay_after_unlink_pid p, li { white-space: pre-wrap;).
With the delay added the "pg_ctl ... restart -l logfile" command (and "vcregress taptest src/bin/pg_basebackup") fails always.
Error message is not very informational, but debugging shows that the file in question is the log file, specified when running the command:
"C:\Windows\system32\cmd.exe" /C ""C:/src/postgresql/tmp_install/bin/postgres.exe" -D "C:/src/postgresql/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/pgdata" --cluster-name=main < "nul" >> "C:/src/postgresql/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log" 2>&1"

If this file is still opened by the previous server shell (it can happen when the previous server instance has unlinked it's pid file, but it's CMD shell is still running), the next CMD start fails with the aforementioned error message.

To fix this issue I propose the attached patch (fix_logfile_sharing_violation p, li { white-space: pre-wrap; }).
With the patch, pg_ctl will wait for the log file to become available (for 30 seconds). And if the file still could not be opened (it can be reproduced with a larger delay in the demo patch), you'll get more meaningful message:
pg_ctl: could not access log file "C:/src/postgresql/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log": Permission denied

Best regards,
Alexander
Attachment

pgsql-bugs by date:

Previous
From: "RideNext"
Date:
Subject: RE: Postgres takes more than 6 minutes to come up during host/standby switch over
Next
From: PG Bug reporting form
Date:
Subject: BUG #16155: error when starting pgAdmin (version 4)