Re: Conflict detection and logging in logical replication - Mailing list pgsql-hackers

From Nisha Moond
Subject Re: Conflict detection and logging in logical replication
Date
Msg-id CABdArM6gULXDHKwpuWWfLeHCpkrnbv4oOUw7igiW7ziPxLp5Gg@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection and logging in logical replication  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Mon, Aug 5, 2024 at 10:05 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Aug 5, 2024 at 9:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
> > >
> > > Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
> > >
> >
> > Thanks for doing the detailed tests for this patch.
> >
> > > The purpose of the performance tests is to measure the impact on
> > > logical replication with track_commit_timestamp enabled, as this
> > > involves fetching the commit_ts data to determine
> > > delete_differ/update_differ conflicts.
> > >
> > > Fortunately, we did not see any noticeable overhead from the new
> > > commit_ts fetch and comparison logic. The only notable impact is
> > > potential overhead from logging conflicts if they occur frequently.
> > > Therefore, enabling conflict detection by default seems feasible, and
> > > introducing a new detect_conflict option may not be necessary.
> > >
> > ...
> > >
> > > Test 1: create conflicts on Sub using pgbench.
> > > ----------------------------------------------------------------
> > > Setup:
> > >  - Both publisher and subscriber have pgbench tables created as-
> > >       pgbench -p $node1_port postgres -qis 1
> > >  - At Sub, a subscription created for all the changes from Pub node.
> > >
> > > Test Run:
> > >  - To test, ran pgbench for 15 minutes on both nodes simultaneously,
> > > which led to concurrent updates and update_differ conflicts on the
> > > Subscriber node.
> > >  Command used to run pgbench on both nodes-
> > >         ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20
> > >
> > > Results:
> > > For each case, note the “tps” and total time taken by the apply-worker
> > > on Sub to apply the changes coming from Pub.
> > >
> > > Case1: track_commit_timestamp = off, detect_conflict = off
> > >     Pub-tps = 9139.556405
> > >     Sub-tps = 8456.787967
> > >     Time of replicating all the changes: 19min 28s
> > > Case 2 : track_commit_timestamp = on, detect_conflict = on
> > >     Pub-tps = 8833.016548
> > >     Sub-tps = 8389.763739
> > >     Time of replicating all the changes: 20min 20s
> > >
> >
> > Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it
> > the impact of track_commit_timestamp = on or something else?

When both the publisher and subscriber nodes are on the same machine,
we observe a decrease in the publisher's TPS in case when
'track_commit_timestamp' is ON for the subscriber. Testing on pgHead
(without the patch) also showed a similar reduction in the publisher's
TPS.

Test Setup: The test was conducted with the same setup as Test-1.

Results:
Case 1: pgHead - 'track_commit_timestamp' = OFF
  - Pub TPS: 9306.25
  - Sub TPS: 8848.91
Case 2: pgHead - 'track_commit_timestamp' = ON
  - Pub TPS: 8915.75
  - Sub TPS: 8667.12

On pgHead too, there was a ~400tps reduction in the publisher when
'track_commit_timestamp' was enabled on the subscriber.

Additionally, code profiling of the walsender on the publisher showed
that the overhead in Case-2 was mainly in the DecodeCommit() call
stack, causing slower write operations, especially in
logicalrep_write_update() and OutputPluginWrite().

case1 : 'track_commit_timestamp' = OFF
--11.57%--xact_decode
| |  DecodeCommit
| |  ReorderBufferCommit
...
| |  --6.10%--pgoutput_change
| |    |
| |    |--3.09%--logicalrep_write_update
| |      ....
| |      |--2.01%--OutputPluginWrite
| |            |--1.97%--WalSndWriteData

case2: 'track_commit_timestamp' = ON
|--53.19%--xact_decode
| |  DecodeCommit
| |  ReorderBufferCommit
...
| |     --30.25%--pgoutput_change
| |      |
| |      |--15.23%--logicalrep_write_update
| |      ....
| |      |--9.82%--OutputPluginWrite
| |           |--9.57%--WalSndWriteData

-- In Case 2, the subscriber's process of writing timestamp data for
millions of rows appears to have impacted all write operations on the
machine.

To confirm the profiling results, we conducted the same test with the
publisher and subscriber on separate machines.

Results:
Case 1: 'track_commit_timestamp' = OFF
  - Run 1: Pub TPS: 2144.10, Sub TPS: 2216.02
  - Run 2: Pub TPS: 2159.41, Sub TPS: 2233.82

Case 2: 'track_commit_timestamp' = ON
  - Run 1: Pub TPS: 2174.39, Sub TPS: 2226.89
  - Run 2: Pub TPS: 2148.92, Sub TPS: 2224.80

Note: The machines used in this test were not as powerful as the one
used in the earlier tests, resulting in lower overall TPS (~2k vs.
~8-9k).
However, the results show no significant reduction in the publisher's
TPS, indicating minimal impact when the nodes are run on separate
machines.

> Was track_commit_timestamp enabled only on subscriber (as needed) or
> on both publisher and subscriber? Nisha, can you please confirm from
> your logs?

Yes, track_commit_timestamp was enabled only on the subscriber.

> > > Case3: track_commit_timestamp = on, detect_conflict = off
> > >     Pub-tps = 8886.101726
> > >     Sub-tps = 8374.508017
> > >     Time of replicating all the changes: 19min 35s
> > > Case 4: track_commit_timestamp = off, detect_conflict = on
> > >     Pub-tps = 8981.924596
> > >     Sub-tps = 8411.120808
> > >     Time of replicating all the changes: 19min 27s
> > >
> > > **The difference of TPS between each case is small. While I can see a
> > > slight increase of the replication time (about 5%), when enabling both
> > > track_commit_timestamp and detect_conflict.
> > >
> >
> > The difference in TPS between case 1 and case 2 is quite visible.
> > IIUC, the replication time difference is due to the logging of
> > conflicts, right?
> >

Right, the major difference is due to the logging of conflicts.

--
Thanks,
Nisha



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences
Next
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences