Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAD21AoCuG9R9S9wjaJEsM4n1JX+=3SFBDqLqEJiNPWkbfwJeHA@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Conflict detection for update_deleted in logical replication
RE: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Wed, Jan 8, 2025 at 3:00 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, January 8, 2025 6:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> > On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond
> > <nisha.moond412@gmail.com> wrote:
> > > > >
> > > > >
> > > > > [3] Test with pgbench run on both publisher and subscriber.
> > > > >
> > > > >
> > > > >
> > > > > Test setup:
> > > > >
> > > > > - Tests performed on pgHead + v16 patches
> > > > >
> > > > > - Created a pub-sub replication system.
> > > > >
> > > > > - Parameters for both instances were:
> > > > >
> > > > >
> > > > >
> > > > >    share_buffers = 30GB
> > > > >
> > > > >    min_wal_size = 10GB
> > > > >
> > > > >    max_wal_size = 20GB
> > > > >
> > > > >    autovacuum = false
> > > >
> > > > Since you disabled autovacuum on the subscriber, dead tuples created
> > > > by non-hot updates are accumulated anyway regardless of
> > > > detect_update_deleted setting, is that right?
> > > >
> > >
> > > I think hot-pruning mechanism during the update operation will remove
> > > dead tuples even when autovacuum is disabled.
> >
> > True, but why did it disable autovacuum? It seems that case1-2_setup.sh
> > doesn't specify fillfactor, which makes hot-updates less likely to happen.
>
> IIUC, we disable autovacuum as a general practice in read-write tests for
> stable TPS numbers.

Okay. TBH I'm not sure what we can say with these results. At a
glance, in a typical bi-directional-like setup,  we can interpret
these results as that if users turn retain_conflict_info on the TPS
goes 50% down.  But I'm not sure this 50% dip is the worst case that
users possibly face. It could be better in practice thanks to
autovacuum, or it also could go even worse due to further bloats if we
run the test longer.

Suppose that users had 50% performance dip due to dead tuple retention
for update_deleted detection, is there any way for users to improve
the situation? For example, trying to advance slot.xmin more
frequently might help to reduce dead tuple accumulation. I think it
would be good if we could have a way to balance between the publisher
performance and the subscriber performance.

In test case 3, we observed a -53% performance dip, which is worse
than the results of test case 5 with wal_receiver_status_interval =
100s. Given that in test case 5 with wal_receiver_status_interval =
100s we cannot remove dead tuples for the most of the whole 120s test
time, probably we could not remove dead tuples for a long time also in
test case 3. I expected that the apply worker gets remote transaction
XIDs and tries to advance slot.xmin more frequently, so this
performance dip surprised me. I would like to know how many times the
apply worker gets remote transaction XIDs and succeeds in advance
slot.xmin during the test.

>
> >
> > I understand that a certain performance dip happens due to dead tuple
> > retention, which is fine, but I'm surprised that the TPS decreased by 50% within
> > 120 seconds. The TPS goes even worse for a longer test?
>
> We will try to increase the time and run the test again.
>
> > I did a quick
> > benchmark where I completely disabled removing dead tuples (by
> > autovacuum=off and a logical slot) and ran pgbench but I didn't see such a
> > precipitous dip.
>
> I think a logical slot only retain the dead tuples on system catalog,
> so the TPS on user table would not be affected that much.

You're right, I missed it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section
Next
From: Sami Imseih
Date:
Subject: Re: Sample rate added to pg_stat_statements