Race conditions in 019_replslot_limit.pl - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Race conditions in 019_replslot_limit.pl
Date
Msg-id 83b46e5f-2a52-86aa-fa6c-8174908174b8@iki.fi
Whole thread Raw
Responses Re: Race conditions in 019_replslot_limit.pl  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
While looking at recent failures in the new 028_pitr_timelines.pl 
recovery test, I noticed that there have been a few failures in the 
buildfarm in the recoveryCheck phase even before that, in the 
019_replslot_limit.pl test.

For example:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2022-02-14%2006%3A30%3A04

[07:42:23] t/018_wal_optimize.pl ................ ok    12403 ms ( 0.00 
usr  0.00 sys +  1.40 cusr  0.63 csys =  2.03 CPU)
# poll_query_until timed out executing this query:
# SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'
# expecting this output:
# lost
# last actual query output:
# unreserved

and:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2022-02-15%2011%3A00%3A08

#   Failed test 'have walsender pid 3682154
# 3682136'
#   at t/019_replslot_limit.pl line 335.
#                   '3682154
# 3682136'
#     doesn't match '(?^:^[0-9]+$)'

The latter looks like there are two walsenders active, which confuses 
the test. Not sure what's happening in the first case, but looks like 
some kind of a race condition at a quick glance.

Has anyone looked into these yet?

- Heikki



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: last_archived_wal is not necessary the latest WAL file (was Re: pgsql: Add test case for an archive recovery corner case.)
Next
From: Andres Freund
Date:
Subject: Re: adding 'zstd' as a compression algorithm