Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed |
Date | |
Msg-id | 657815a2-5a89-fcc1-1c9d-d77a6986bc26@gmail.com Whole thread Raw |
In response to | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
|
List | pgsql-hackers |
Hello Bertrand and Michael, 23.01.2024 11:07, Bertrand Drouvot wrote: > On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote: > >> Anyway, that's not the end of it. What should we do for snapshot >> snapshot records coming from the bgwriter? > What about? > > 3) depending on how stabilized this test (and others that suffer from "random" > xl_running_xacts) is, then think about the bgwriter. A recent buildfarm failure [1] reminds me of that remaining question. Here we have a slow machine (a successful run, for example [2], shows 541.13s duration of the test) and the following information logged: [13:55:13.725](34.411s) ok 25 - inactiveslot slot invalidation is logged with vacuum on pg_class [13:55:13.727](0.002s) not ok 26 - activeslot slot invalidation is logged with vacuum on pg_class [13:55:13.728](0.001s) # Failed test 'activeslot slot invalidation is logged with vacuum on pg_class' # at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 229. [14:27:42.995](1949.267s) # poll_query_until timed out executing this query: # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb' # expecting this output: # t # last actual query output: # f # with stderr: [14:27:42.999](0.004s) not ok 27 - confl_active_logicalslot updated [14:27:43.000](0.001s) # Failed test 'confl_active_logicalslot updated' # at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235. Timed out waiting confl_active_logicalslot to be updated at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235. --- 035_standby_logical_decoding_standby.log: 2024-06-06 13:55:07.715 UTC [9172:7] LOG: invalidating obsolete replication slot "row_removal_inactiveslot" 2024-06-06 13:55:07.715 UTC [9172:8] DETAIL: The slot conflicted with xid horizon 754. 2024-06-06 13:55:07.715 UTC [9172:9] CONTEXT: WAL redo at 0/4020A80 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 754, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 1, nunused: 0, dead: [48]; blkref #0: rel 1663/16384/2610, blk 0 2024-06-06 13:55:14.372 UTC [7532:1] [unknown] LOG: connection received: host=127.0.0.1 port=55328 2024-06-06 13:55:14.381 UTC [7532:2] [unknown] LOG: connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner" method=sspi (C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_hba.conf:2) 2024-06-06 13:55:14.381 UTC [7532:3] [unknown] LOG: connection authorized: user=pgrunner database=postgres application_name=035_standby_logical_decoding.pl 2024-06-06 13:55:14.443 UTC [7532:4] 035_standby_logical_decoding.pl LOG: statement: select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb' 2024-06-06 13:55:14.452 UTC [7532:5] 035_standby_logical_decoding.pl LOG: disconnection: session time: 0:00:00.090 user=pgrunner database=postgres host=127.0.0.1 port=55328 # (there is no `invalidating obsolete replication slot "row_removal_activeslot"` message) ... 2024-06-06 14:27:42.675 UTC [4032:4] 035_standby_logical_decoding.pl LOG: statement: select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb' 2024-06-06 14:27:42.681 UTC [4032:5] 035_standby_logical_decoding.pl LOG: disconnection: session time: 0:00:00.080 user=pgrunner database=postgres host=127.0.0.1 port=58713 2024-06-06 14:27:43.095 UTC [7892:2] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. It's hard to determine from this info, why row_removal_activeslot was not invalidated, but running this test on a slowed down Windows VM, I (still) get the same looking failures caused by RUNNING_XACTS appeared just before `invalidating obsolete replication slot "row_removal_inactiveslot"`. So I would consider this failure as yet another result of bgwriter activity and add it to the list of known failures as such... [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-06-06%2012%3A36%3A11 [2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2024-06-05%2017%3A03%3A13&stg=misc-check Best regards, Alexander
pgsql-hackers by date: