On Fri, 30 Aug 2024 at 22:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> > On Fri, Aug 30, 2024, 21:21 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> While we're piling on, has anyone noticed that *non* Windows buildfarm
> >> animals are also failing this test pretty frequently?
>
> > Yes. Fixes are here (see the ~10 emails above in the thread for details):
> > https://www.postgresql.org/message-id/CAGECzQQO8Cn2Rw45xUYmvzXeSSsst7-bcruuzUfMbGQc3ueSdw@mail.gmail.com
>
> Hmm. I'm not convinced that 0001 is an actual *fix*, but it should
> at least reduce the frequency of occurrence a lot, which'd help.
I also don't think it's an actual fix, but I couldn't think of a way
to fix this. And since this only happens if you cancel right at the
start of a postgres_fdw query, I don't think it's worth investing too
much time on a fix.
> I don't want to move the test case to where you propose, because
> that's basically not sensible. But can't we avoid remote estimates
> by just cross-joining ft1 to itself, and not using the tables for
> which remote estimate is enabled?
Yeah that should work too (I just saw your next email, where you said
it's committed like this).
> I think 0002 is probably outright wrong, or at least the change to
> disable_statement_timeout is. Once we get to that, we don't want
> to throw a timeout error any more, even if an interrupt was received
> just before it.
The disable_statement_timeout change was not the part of that patch
that was necessary for stable output, only the change in the first
branch of enable_statement_timeout was necessary. The reason being
that enable_statement_timeout is called multiple times for a query,
because start_xact_command is called multiple times in
exec_simple_query. The change to disable_statement_timeout just seemed
like the logical extension of that change, especially since there was
basically a verbatim copy of disable_statement_timeout in the second
branch of enable_statement_timeout.
To make sure I understand your suggestion correctly: Are you saying
you would want to completely remove the outstanding interrupt if it
was caused by de statement_timout when disable_statement_timeout is
called? Because I agree that would probably make sense, but that
sounds like a more impactful change. But the current behaviour seems
strictly worse than the behaviour proposed in the patch to me, because
currently the backend would still be interrupted, but the error would
indicate a reason for the interrupt that is simply incorrect i.e. it
will say it was cancelled due to a user request, which never happened.