Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 2, 2023 12:15:57
Msg-id	CAA4eK1JxocgW+CKUV=bcomTfyANMo__Og+g9ruTxpc=JXLarVQ@mail.gmail.com Whole thread Raw
In response to	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

On Thu, Mar 2, 2023 at 1:37 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Wed, Mar 1, 2023 9:22 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> > > # Result
> > The time executing update (the average of 3 runs is taken, the unit is
> > milliseconds):
> >
> > Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?
> > I suspect maybe your test results have a very small pool size, and some runs are making
> > the average slightly problematic.
> >
> > In my tests, I shared the total time, which is probably also fine.
> >
>
> Thanks for your reply, I re-tested (based on
> v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
> of 100 runs. The results are as follows. The unit is milliseconds.
>
> case1
> sequential scan: 1348.57
> index scan: 3785.15
>
> case2
> sequential scan: 1350.26
> index scan: 1754.01
>
> case3
> sequential scan: 1350.13
> index scan: 1340.97
>
> There was still some degradation in the first two cases. There are some gaps in
> our test results. Some information about my test is as follows.
>
> a. Some parameters specified in postgresql.conf.
> shared_buffers = 8GB
> checkpoint_timeout = 30min
> max_wal_size = 20GB
> min_wal_size = 10GB
> autovacuum = off
>
> b. Executed SQL.
> I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
> same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
> lots of old tuples and this might have a bigger impact on sequential scan. I
> tried this case (which executes 50 consecutive UPDATEs) and also saw that the
> overhead is smaller than before.
>
>
> Besides, I looked into the regression of this patch with `gprof`. Some results
> are as follows. I think with single buffer lock, sequential scan can scan
> multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
> in case1, which has lots of duplicate values and more tuples need to be scanned,
> index scan takes longer time.
>
> - results of `gprof`
> case1:
> master
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   1.37      0.66     0.01   654312     0.00     0.00  LWLockAttemptLock
>   0.00      0.73     0.00   573358     0.00     0.00  LockBuffer
>   0.00      0.73     0.00    10014     0.00     0.06  heap_getnextslot
>
> patched
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   9.70      1.27     0.36 50531459     0.00     0.00  LWLockAttemptLock
>   3.23      2.42     0.12 100259200     0.00     0.00  LockBuffer
>   6.20      1.50     0.23 50015101     0.00     0.00  heapam_index_fetch_tuple
>   4.04      2.02     0.15 50015101     0.00     0.00  index_fetch_heap
>   1.35      3.21     0.05    10119     0.00     0.00  index_getnext_slot
>

In the above profile number of calls to index_fetch_heap(),
heapam_index_fetch_tuple() explains the reason for the regression you
are seeing with the index scan. Because the update will generate dead
tuples in the same transaction and those dead tuples won't be removed,
we get those from the index and then need to perform
index_fetch_heap() to find out whether the tuple is dead or not. Now,
for sequence scan also we need to scan those dead tuples but there we
don't need to do back-and-forth between index and heap. I think we can
once check with more number of tuples (say with 20000, 50000, etc.)
for case-1.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Peter Eisentraut
Date: 02 March 2023, 11:32:44
Subject: Re: Missing free_var() at end of accum_sum_final()?

From: "Drouvot, Bertrand"
Date: 02 March 2023, 12:20:09
Subject: Re: Minimal logical decoding on standbys

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Previous

Next