Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date
Msg-id ZZ7GpII4bAYN+jT5@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
List pgsql-hackers
Hi,

On Wed, Jan 10, 2024 at 05:00:01PM +0300, Alexander Lakhin wrote:
> 10.01.2024 12:46, Bertrand Drouvot wrote:
> 
> > Would it be possible to also send the standby logs?
> 
> Yes, please look at the attached logs. This time I've build postgres with
> -DWAL_DEBUG and ran tests with TEMP_CONFIG as below:
> wal_keep_size=1GB
> wal_debug = on
> log_autovacuum_min_duration = 0
> log_statement = 'all'
> log_min_messages = INFO
> 
> The archive attached contains logs from four runs:
> recovery-1-ok -- an example of successful run for reference
> recovery-7-pruning and recovery-19-pruning -- failures with a failed
>  subtest 'activeslot slot invalidation is logged with on-access pruning'
> recovery-15-vacuum_pg_class -- a failure with a failed
>  subtest 'activeslot slot invalidation is logged with vacuum on pg_class'

Thanks a lot for the testing!

> is an absent message 'obsolete replication slot "row_removal_activeslot"'

Looking at recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here:

Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot().

In case of an active slot we first call ReportSlotInvalidation with the second
parameter set to true (to emit the "terminating" message), then SIGTERM the active
process and then (later) we should call the other ReportSlotInvalidation()
call with the second parameter set to false (to emit the message that we don't
see here).

So it means InvalidatePossiblyObsoleteSlot() did not trigger the second ReportSlotInvalidation()
call. 

The thing it that it looks like we exited the loop in InvalidatePossiblyObsoleteSlot()
because there is more messages from the startup process (789618) after the:


"
2024-01-10 11:00:29.109 UTC [789618] LOG:  terminating process 790377 to release replication slot
"row_removal_activeslot"
"

one.

Do you think you could try to add more debug messages in InvalidatePossiblyObsoleteSlot()
to understand why the second call to ReportSlotInvalidation() is not done and IIUC
why/how we exited the loop?

FWIW, I did try to reproduce by launching pg_recvlogical and then kill -SIGSTOP
it. Then producing a conflict, I'm able to get the first message and not the second
one (which is expected). But the startup process does not exit the loop, which is
expected here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Relation bulk write facility
Next
From: Bernd Helmle
Date:
Subject: Re: [PATCH] Add sortsupport for range types and btree_gist