Re: [EXTERNAL] Re: Add non-blocking version of PQcancel - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date
Msg-id CAGECzQQFvsuBcT1Bu3eDWDWuxt=GT0MsGJJKK_5og51Rk-4q7Q@mail.gmail.com
Whole thread Raw
In response to Re: [EXTERNAL] Re: Add non-blocking version of PQcancel  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
List pgsql-hackers
On Thu, 21 Nov 2024 at 02:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, given that info, Jelte's unapplied 0002 patch earlier in the
> thread is not the answer, because this is about dropping a query
> cancel not about losing a timeout interrupt.

Agreed that 0002 does not fix the issue re-reported by Andres (let's
call this issue number 1). But I'm still of the opinion that 0002
fixes a real bug: i.e. a bug which causes timeouts.spec to randomly
fail[1] (let's call this issue number 2).
> This seems to fix the problem here.  Thoughts?

Overall, a good approach to fix issue number 1. I think it would be
best if this was integrated into libpqsrv_cancel instead though. That
way the dblink would benefit from it too.

nit: Maybe call it RETRY_CANCEL_TIME. The RE_ prefix wasn't instantly
obvious what it meant to me, it seemed like an abbreviation when I first saw it.

> BTW, while I didn't do it in the attached, I'm tempted to greatly
> reduce the 100ms delay in query_cancel.sql.  If this does make it
> more robust, we shouldn't need that much time anymore.

Seems sensible to me.


Finally there's a third issue[2] (let's call this issue number 3).
Alexander did some investigation into this issue too[3]. For this one
I have a hard time understanding what is going on, or at least how
this issue only seems to apply to cancel connections. From his
description of the problem and my reading of the code it seems that if
we fail to send the StartupPacket/CancelRequest due to a socket error,
we set the write_failed flag. But we don't actually check this flag
during the CONNECTION_AWAITING_RESPONSE phase of PQconnectPoll, so we
just wait until we reach a timeout because the server never sends us
anything.


[1]:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#timeouts.spec_failed_because_of_statement_cancelled_due_to_unexpected_reason
[2]:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#dblink.sql_.28and_postgres_fdw.sql.29_fail_on_Windows_due_to_the_cancel_packet_not_sent
[3]: https://www.postgresql.org/message-id/5ea25e4d-1ee2-b9bf-7806-119ffa658826@gmail.com



pgsql-hackers by date:

Previous
From: yuansong
Date:
Subject: Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST
Next
From: Joe Conway
Date:
Subject: Re: Replace current implementations in crypt() and gen_salt() to OpenSSL