Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	July 20, 2022 04:50:45
Msg-id	CAA4eK1+mmsZqyVrVO6ey2QjYzd2EX344-jryiz3BXiHVwnQ7Eg@mail.gmail.com Whole thread
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

On Tue, Jul 19, 2022 at 1:46 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi, thanks for your reply.
>
> Amit Kapila <amit.kapila16@gmail.com>, 18 Tem 2022 Pzt, 08:29 tarihinde şunu yazdı:
>>
>> >
>>
>> IIUC, this proposal is to optimize cases where users can't have a
>> unique/primary key for a relation on the subscriber and those
>> relations receive lots of updates or deletes?
>
>
> Yes, that is right.
>
> In a similar perspective, I see this patch useful for reducing the "use primary key/unique index" requirement to "use
anyindex" for a reasonably performant logical replication with updates/deletes. 
>

Agreed. BTW, have you seen any such requirements from users where this
will be useful for them?

>>
>>
>> It seems that in favorable cases it will improve performance but we
>> should consider unfavorable cases as well. Two things that come to
>> mind in that regard are (a) while choosing index/seq. scan paths, the
>> patch doesn't account for cost for tuples_equal() which needs to be
>> performed for index scans, (b) it appears to me that the patch decides
>> which index to use the first time it opens the rel (or if the rel gets
>> invalidated) on subscriber and then for all consecutive operations it
>> uses the same index. It is quite possible that after some more
>> operations on the table, using the same index will actually be
>> costlier than a sequence scan or some other index scan
>
>
> Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.
>
> Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the
CacheRegisterRelcacheCallbackhelps us to re-create the cache entries. In this case, as far as I can see, we need a
callbackthat is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a
newindex makes sense. 
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes
evenharder. 
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

 So, please see the next idea as well.
>
> - Ask users to manually pick the index they want to use: Currently, the main complexity of the patch comes with the
plannerrelated code. In fact, if you look into the logical replication related changes, those are relatively modest
changes.If we can drop the feature that Postgres picks the index, and provide a user interface to set the indexes per
tablein the subscription, we can probably have an easier patch to review & test. For example, we could add `ALTER
SUBSCRIPTIONsub ALTER TABLE t USE INDEX i` type of an API. This also needs some coding, but probably much simpler than
thecurrent code. And, obviously, this pops up the question of can users pick the right index? 
>

I think picking the right index is one point and another is what if
the subscription has many tables (say 10K or more), doing it for
individual tables per subscription won't be fun. Also, users need to
identify which tables belong to a particular subscription, now, users
can find the same via pg_subscription_rel or some other way but doing
this won't be straightforward for users. So, my inclination would be
to pick the right index automatically rather than getting the input
from the user.

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Thomas Munro
Date: 20 July 2022, 04:48:58
Subject: Re: Windows now has fdatasync()

From: Junwang Zhao
Date: 20 July 2022, 04:51:24
Subject: Re: Memory leak fix in psql

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Previous

Next