> On 4 Apr 2024, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> A minimum fix that seems to make this work better is as attached,
>> but I feel like somebody ought to examine all the IPC::Run::timer
>> and IPC::Run::timeout calls and see which ones are mistaken.
>> It's a little scary to convert a timeout to a timer because of
>> the hazard that someplace that would need to be checking for
>> is_expired isn't.
>
> Skimming this and a few callsites it seems reasonable to use a timer instead of
> a timeout, but careful study is needed to make sure we're not introducing
> anything subtly wrong in the other direction.
Sharing a few preliminary results from looking at this, the attached passes
check-world but need more study/testing.
It seems wrong to me that we die() in query and query_until rather than giving
the caller the power to decide how to proceed. We have even documented that we
do just this:
"Dies on failure to invoke psql, or if psql fails to connect. Errors
occurring later are the caller's problem"
Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it). Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.
A few other comments on related things that stood out while reviewing:
The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair. Done in 0001 (but
really not necessary).
Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:
$back_q->query_until(
qr/logical_slot_get_changes/, q(
\echo logical_slot_get_changes
SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));
... <other tests> ...
# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;
There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point. This might still work as intended, but the comment is
slightly misleading IMO.
recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating. Calling ->quit is
mandated by the API, in turn required by IPC::Run. Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes
this.
--
Daniel Gustafsson