pgsql: postgres_fdw: re-issue cancel requests a few times if necessary. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: postgres_fdw: re-issue cancel requests a few times if necessary.
Date
Msg-id E1tPop6-000z1i-OP@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
postgres_fdw: re-issue cancel requests a few times if necessary.

Despite the best efforts of commit 0e5c82380, we're still seeing
occasional failures of postgres_fdw's query_cancel test in the
buildfarm.  Investigation suggests that its 100ms timeout is
still not enough to reliably ensure that the remote side starts
the query before receiving the cancel request --- and if it
hasn't, it will just discard the request because it's idle.

We discussed allowing a cancel request to kill the next-received
query, but that would have wide and perhaps unpleasant side-effects.
What seems safer is to make postgres_fdw do what a human user would
likely do, which is issue another cancel request if the first one
didn't seem to do anything.  We'll keep the same overall 30 second
grace period before concluding things are broken, but issue additional
cancel requests after 1 second, then 2 more seconds, then 4, then 8.
(The next one in series is 16 seconds, but we'll hit the 30 second
timeout before that.)

Having done that, revert the timeout in query_cancel.sql to 10 ms.
That will still be enough on most machines, most of the time, for
the remote query to start; but now we're intentionally risking the
race condition occurring sometimes in the buildfarm, so that the
repeat-cancel code path will get some testing.

As before, back-patch to v17.  We might eventually contemplate
back-patching this further, and/or adding similar logic to dblink.
But given the lack of field complaints to date, this feels like
mostly an exercise in test case stabilization, so v17 is enough.

Discussion: https://postgr.es/m/colnv3lzzmc53iu5qoawynr6qq7etn47lmggqr65ddtpjliq5d@glkveb4m6nop

Branch
------
REL_17_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/89962bfef624822786f8fc6268307407fd10bc7f

Modified Files
--------------
contrib/postgres_fdw/connection.c              | 96 ++++++++++++++++++++------
contrib/postgres_fdw/expected/query_cancel.out |  6 +-
contrib/postgres_fdw/sql/query_cancel.sql      |  6 +-
3 files changed, 84 insertions(+), 24 deletions(-)


pgsql-committers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pgsql: Don't allow GetTransactionSnapshot() in logical decoding
Next
From: Peter Geoghegan
Date:
Subject: pgsql: Reset btpo_cycleid in nbtree VACUUM's REDO routine.