Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 1, 2023 16:21:52
Msg-id	CACawEhWbq207=dDvRzteiXoAtGUS=eTcdkBEJ+Lh-bKkohFqYg@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Andres Freund <andres@anarazel.de>)
Responses	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

Hi Andres, Amit, Shi Yu, all

Andres Freund <andres@anarazel.de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu yazdı:

Hi,

On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >> I think this overhead seems to be mostly due to the need to perform
> >> tuples_equal multiple times for duplicate values.

I think more work needs to be done to determine the source of the
overhead. It's not clear to me why there'd be an increase in tuples_equal()
calls in the tests upthread.

You are right, looking closely, in fact, we most of the time do much less

tuples_equal() with index scan.

I've done some profiling with perf, and created flame graphs for the apply worker, with the

test described above: -- case 1 (All values are duplicated). I used the following commands:

- perf record -F 99 -p 122555 -g -- sleep 60

- perf script | ./stackcollapse-perf.pl > out.perf-folded

- ./flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg

I attached both flame graphs. I do not see anything specific regarding what the patch does, but

instead the difference mostly seems to come down to index scan vs sequential scan related

functions. As I continue to investigate, I thought it might be useful to share the flame graphs

so that more experienced hackers could comment on the difference.

Regarding my own end-to-end tests: In some runs, the sequential scan is indeed faster for case-1. But,

when I execute update tbl set a=a+1; for 50 consecutive times, and measure end to end performance, I see

much better results for index scan, only case-1 is on-par as mostly I'd expect.

Case-1, running the update 50 times and waiting all changes applied

index scan: 2minutes 36 seconds
sequential scan: 2minutes 30 seconds

Case-2, running the update 50 times and waiting all changes applied

index scan: 1 minutes, 2 seconds
sequential scan: 2minutes 30 seconds

Case-7, running the update 50 times and waiting all changes applied

index scan: 6 seconds
sequential scan: 2minutes 26seconds

> # Result

The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?

I suspect maybe your test results have a very small pool size, and some runs are making

the average slightly problematic.

In my tests, I shared the total time, which is probably also fine.

Thanks,

Onder

Attachment

pgsql-hackers by date:

From: Mikhail Gribkov
Date: 01 March 2023, 16:12:14
Subject: Re: On login trigger: take three

From: gkokolatos@pm.me
Date: 01 March 2023, 16:39:14
Subject: Re: Add LZ4 compression in pg_dump

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next