Re: Transaction timeout - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: Transaction timeout
Date
Msg-id A738D5A4-23BA-4AD6-A01E-EF34A5A98C57@yandex-team.ru
Whole thread Raw
In response to Re: Transaction timeout  (Andres Freund <andres@anarazel.de>)
Responses Re: Transaction timeout
List pgsql-hackers
Alexander, thanks for pushing this! This is small but very awaited feature.

> On 16 Feb 2024, at 02:08, Andres Freund <andres@anarazel.de> wrote:
>
> Isn't this test going to be very fragile on busy / slow machines? What if the
> pg_sleep() takes one second, because there were other tasks to schedule?  I'd
> be surprised if this didn't fail under valgrind, for example.

Even more robust tests that were bullet-proof in CI previously exhibited some failures on buildfarm. Currently there
are5 failures through this weekend. 
Failing tests are testing interaction of idle_in_transaction_session_timeout vs transaction_timeout(5), and
reschedulingtransaction_timeout(6). 
Symptoms:

[0] transaction timeout occurs when it is being scheduled. Seems like SET was running to long.
 step s6_begin: BEGIN ISOLATION LEVEL READ COMMITTED;
 step s6_tt: SET statement_timeout = '1s'; SET transaction_timeout = '10ms';
+s6: FATAL:  terminating connection due to transaction timeout
 step checker_sleep: SELECT pg_sleep(0.1);

[1] transaction timeout 10ms is not detected after 1s
step s6_check: SELECT count(*) FROM pg_stat_activity WHERE application_name = 'isolation/timeouts/s6';
 count
 -----
-    0
+    1

[2] transaction timeout is not detected in both session 5 and session 6.

So far not signle animal reported failures twice, so it's hard to say anything about frequency. But it seems to be
significantsource of failures. 

So far I have these ideas:

1. Remove test sessions 5 and 6. But it seems a little strange that session 3 did  not fail at all (it is testing
interactionof statement_timeout and transaction_timeout). This test is very similar to test sessiont 5... 
2. Increase wait times.
step checker_sleep    { SELECT pg_sleep(0.1); }
Seems not enough to observe backend timed out from pg_stat_activity. But this won't help from [0].
3. Reuse waiting INJECTION_POINT from [3] to make timeout tests deterministic and safe from race conditions. With
waitinginjection points we can wait as much as needed in current environment. 

Any advices are welcome.


Best regards, Andrey Borodin.


[0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-02-16%2020%3A06%3A51
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-02-16%2001%3A45%3A10
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-02-17%2001%3A55%3A45
[3] https://www.postgresql.org/message-id/0925F9A9-4D53-4B27-A87E-3D83A757B0E0@yandex-team.ru


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Running the fdw test from the terminal crashes into the core-dump
Next
From: Alvaro Herrera
Date:
Subject: Re: Running the fdw test from the terminal crashes into the core-dump