Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | CAA4eK1JxocgW+CKUV=bcomTfyANMo__Og+g9ruTxpc=JXLarVQ@mail.gmail.com Whole thread Raw |
In response to | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>) |
Responses |
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
List | pgsql-hackers |
On Thu, Mar 2, 2023 at 1:37 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > > On Wed, Mar 1, 2023 9:22 PM Önder Kalacı <onderkalaci@gmail.com> wrote: > > > > > # Result > > The time executing update (the average of 3 runs is taken, the unit is > > milliseconds): > > > > Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average? > > I suspect maybe your test results have a very small pool size, and some runs are making > > the average slightly problematic. > > > > In my tests, I shared the total time, which is probably also fine. > > > > Thanks for your reply, I re-tested (based on > v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average > of 100 runs. The results are as follows. The unit is milliseconds. > > case1 > sequential scan: 1348.57 > index scan: 3785.15 > > case2 > sequential scan: 1350.26 > index scan: 1754.01 > > case3 > sequential scan: 1350.13 > index scan: 1340.97 > > There was still some degradation in the first two cases. There are some gaps in > our test results. Some information about my test is as follows. > > a. Some parameters specified in postgresql.conf. > shared_buffers = 8GB > checkpoint_timeout = 30min > max_wal_size = 20GB > min_wal_size = 10GB > autovacuum = off > > b. Executed SQL. > I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the > same, or just executed 50 consecutive UPDATEs. If the latter one, there would be > lots of old tuples and this might have a bigger impact on sequential scan. I > tried this case (which executes 50 consecutive UPDATEs) and also saw that the > overhead is smaller than before. > > > Besides, I looked into the regression of this patch with `gprof`. Some results > are as follows. I think with single buffer lock, sequential scan can scan > multiple tuples (see heapgettup()), while index scan can only scan one tuple. So > in case1, which has lots of duplicate values and more tuples need to be scanned, > index scan takes longer time. > > - results of `gprof` > case1: > master > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 1.37 0.66 0.01 654312 0.00 0.00 LWLockAttemptLock > 0.00 0.73 0.00 573358 0.00 0.00 LockBuffer > 0.00 0.73 0.00 10014 0.00 0.06 heap_getnextslot > > patched > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 9.70 1.27 0.36 50531459 0.00 0.00 LWLockAttemptLock > 3.23 2.42 0.12 100259200 0.00 0.00 LockBuffer > 6.20 1.50 0.23 50015101 0.00 0.00 heapam_index_fetch_tuple > 4.04 2.02 0.15 50015101 0.00 0.00 index_fetch_heap > 1.35 3.21 0.05 10119 0.00 0.00 index_getnext_slot > In the above profile number of calls to index_fetch_heap(), heapam_index_fetch_tuple() explains the reason for the regression you are seeing with the index scan. Because the update will generate dead tuples in the same transaction and those dead tuples won't be removed, we get those from the index and then need to perform index_fetch_heap() to find out whether the tuple is dead or not. Now, for sequence scan also we need to scan those dead tuples but there we don't need to do back-and-forth between index and heap. I think we can once check with more number of tuples (say with 20000, 50000, etc.) for case-1. -- With Regards, Amit Kapila.
pgsql-hackers by date: