Re: Conflict detection and logging in logical replication - Mailing list pgsql-hackers
From | Nisha Moond |
---|---|
Subject | Re: Conflict detection and logging in logical replication |
Date | |
Msg-id | CABdArM6gULXDHKwpuWWfLeHCpkrnbv4oOUw7igiW7ziPxLp5Gg@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection and logging in logical replication (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Mon, Aug 5, 2024 at 10:05 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Aug 5, 2024 at 9:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > Performance tests done on the v8-0001 and v8-0002 patches, available at [1]. > > > > > > > Thanks for doing the detailed tests for this patch. > > > > > The purpose of the performance tests is to measure the impact on > > > logical replication with track_commit_timestamp enabled, as this > > > involves fetching the commit_ts data to determine > > > delete_differ/update_differ conflicts. > > > > > > Fortunately, we did not see any noticeable overhead from the new > > > commit_ts fetch and comparison logic. The only notable impact is > > > potential overhead from logging conflicts if they occur frequently. > > > Therefore, enabling conflict detection by default seems feasible, and > > > introducing a new detect_conflict option may not be necessary. > > > > > ... > > > > > > Test 1: create conflicts on Sub using pgbench. > > > ---------------------------------------------------------------- > > > Setup: > > > - Both publisher and subscriber have pgbench tables created as- > > > pgbench -p $node1_port postgres -qis 1 > > > - At Sub, a subscription created for all the changes from Pub node. > > > > > > Test Run: > > > - To test, ran pgbench for 15 minutes on both nodes simultaneously, > > > which led to concurrent updates and update_differ conflicts on the > > > Subscriber node. > > > Command used to run pgbench on both nodes- > > > ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20 > > > > > > Results: > > > For each case, note the “tps” and total time taken by the apply-worker > > > on Sub to apply the changes coming from Pub. > > > > > > Case1: track_commit_timestamp = off, detect_conflict = off > > > Pub-tps = 9139.556405 > > > Sub-tps = 8456.787967 > > > Time of replicating all the changes: 19min 28s > > > Case 2 : track_commit_timestamp = on, detect_conflict = on > > > Pub-tps = 8833.016548 > > > Sub-tps = 8389.763739 > > > Time of replicating all the changes: 20min 20s > > > > > > > Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it > > the impact of track_commit_timestamp = on or something else? When both the publisher and subscriber nodes are on the same machine, we observe a decrease in the publisher's TPS in case when 'track_commit_timestamp' is ON for the subscriber. Testing on pgHead (without the patch) also showed a similar reduction in the publisher's TPS. Test Setup: The test was conducted with the same setup as Test-1. Results: Case 1: pgHead - 'track_commit_timestamp' = OFF - Pub TPS: 9306.25 - Sub TPS: 8848.91 Case 2: pgHead - 'track_commit_timestamp' = ON - Pub TPS: 8915.75 - Sub TPS: 8667.12 On pgHead too, there was a ~400tps reduction in the publisher when 'track_commit_timestamp' was enabled on the subscriber. Additionally, code profiling of the walsender on the publisher showed that the overhead in Case-2 was mainly in the DecodeCommit() call stack, causing slower write operations, especially in logicalrep_write_update() and OutputPluginWrite(). case1 : 'track_commit_timestamp' = OFF --11.57%--xact_decode | | DecodeCommit | | ReorderBufferCommit ... | | --6.10%--pgoutput_change | | | | | |--3.09%--logicalrep_write_update | | .... | | |--2.01%--OutputPluginWrite | | |--1.97%--WalSndWriteData case2: 'track_commit_timestamp' = ON |--53.19%--xact_decode | | DecodeCommit | | ReorderBufferCommit ... | | --30.25%--pgoutput_change | | | | | |--15.23%--logicalrep_write_update | | .... | | |--9.82%--OutputPluginWrite | | |--9.57%--WalSndWriteData -- In Case 2, the subscriber's process of writing timestamp data for millions of rows appears to have impacted all write operations on the machine. To confirm the profiling results, we conducted the same test with the publisher and subscriber on separate machines. Results: Case 1: 'track_commit_timestamp' = OFF - Run 1: Pub TPS: 2144.10, Sub TPS: 2216.02 - Run 2: Pub TPS: 2159.41, Sub TPS: 2233.82 Case 2: 'track_commit_timestamp' = ON - Run 1: Pub TPS: 2174.39, Sub TPS: 2226.89 - Run 2: Pub TPS: 2148.92, Sub TPS: 2224.80 Note: The machines used in this test were not as powerful as the one used in the earlier tests, resulting in lower overall TPS (~2k vs. ~8-9k). However, the results show no significant reduction in the publisher's TPS, indicating minimal impact when the nodes are run on separate machines. > Was track_commit_timestamp enabled only on subscriber (as needed) or > on both publisher and subscriber? Nisha, can you please confirm from > your logs? Yes, track_commit_timestamp was enabled only on the subscriber. > > > Case3: track_commit_timestamp = on, detect_conflict = off > > > Pub-tps = 8886.101726 > > > Sub-tps = 8374.508017 > > > Time of replicating all the changes: 19min 35s > > > Case 4: track_commit_timestamp = off, detect_conflict = on > > > Pub-tps = 8981.924596 > > > Sub-tps = 8411.120808 > > > Time of replicating all the changes: 19min 27s > > > > > > **The difference of TPS between each case is small. While I can see a > > > slight increase of the replication time (about 5%), when enabling both > > > track_commit_timestamp and detect_conflict. > > > > > > > The difference in TPS between case 1 and case 2 is quite visible. > > IIUC, the replication time difference is due to the logging of > > conflicts, right? > > Right, the major difference is due to the logging of conflicts. -- Thanks, Nisha
pgsql-hackers by date: