Re: [EXTERNAL] Re: Add non-blocking version of PQcancel - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date | |
Msg-id | 4db099c8-4a52-3cc4-e970-14539a319466@gmail.com Whole thread Raw |
In response to | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel (Noah Misch <noah@leadboat.com>) |
Responses |
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
|
List | pgsql-hackers |
Hello hackers, 30.03.2024 01:17, Noah Misch wrote: > On Fri, Mar 29, 2024 at 09:17:55AM +0100, Jelte Fennema-Nio wrote: >> On Thu, 28 Mar 2024 at 19:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Could we make this test bulletproof by using an injection point? >>> If not, I remain of the opinion that we're better off without it. >> Possibly, and if so, I agree that would be better than the currently >> added test. But I honestly don't feel like spending the time on >> creating such a test. > The SQL test is more representative of real applications, and it's way simpler > to understand. In general, I prefer 6-line SQL tests that catch a problem 10% > of the time over injection point tests that catch it 100% of the time. For > low detection rate to be exciting, it needs to be low enough to have a serious > chance of all buildfarm members reporting green for the bad commit. With ~115 > buildfarm members running in the last day, 0.1% detection rate would have been > low enough to bother improving, but 4% would be high enough to call it good. As a recent buildfarm failure on orlingo (which tests asan-enabled builds) [1] shows, that test can still fail: 70/70 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 278.67s exit status 1 @@ -2775,6 +2775,7 @@ SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long ERROR: canceling statement due to statement timeout +WARNING: could not get result of cancel request due to timeout COMMIT; (from the next run we can see normal duration: "postgres_fdw-running/regress OK 6.30s ") I reproduced the failure with an asan-enabled build on a slowed-down VM and as far as I can see, it's caused by the following condition in ProcessInterrupts(): /* * If we are reading a command from the client, just ignore the cancel * request --- sending an extra error message won't accomplish * anything. Otherwise, go ahead and throw the error. */ if (!DoingCommandRead) { LockErrorCleanup(); ereport(ERROR, (errcode(ERRCODE_QUERY_CANCELED), errmsg("canceling statement due to user request"))); } I think this failure can be reproduced easily (without asan/slowing down) with this modification: @@ -4630,6 +4630,7 @@ PostgresMain(const char *dbname, const char *username) idle_session_timeout_enabled = false; } +if (rand() % 10 == 0) pg_usleep(10000); /* * (5) disable async signal conditions again. * Running this test in a loop (for ((i=1;i<=100;i++)); do \ echo "iteration $i"; make -s check -C contrib/postgres_fdw/ || break; \ done), I get: ... iteration 56 # +++ regress check in contrib/postgres_fdw +++ # initializing database system by copying initdb template # using temp instance on port 55312 with PID 991332 ok 1 - postgres_fdw 20093 ms 1..1 # All 1 tests passed. iteration 57 # +++ regress check in contrib/postgres_fdw +++ # initializing database system by copying initdb template # using temp instance on port 55312 with PID 992152 not ok 1 - postgres_fdw 62064 ms 1..1 ... --- .../contrib/postgres_fdw/expected/postgres_fdw.out 2024-06-22 02:52:42.991574907 +0000 +++ .../contrib/postgres_fdw/results/postgres_fdw.out 2024-06-22 14:43:43.949552927 +0000 @@ -2775,6 +2775,7 @@ SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long ERROR: canceling statement due to statement timeout +WARNING: could not get result of cancel request due to timeout COMMIT; I also came across another failure of the test: @@ -2774,7 +2774,7 @@ BEGIN; SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long -ERROR: canceling statement due to statement timeout +ERROR: canceling statement due to user request COMMIT; which is reproduced with a sleep added here: @@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string) */ parsetree_list = pg_parse_query(query_string); +pg_usleep(11000); [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-06-20%2009%3A52%3A04 Best regards, Alexander
pgsql-hackers by date: