Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	August 1, 2022 19:21:48
Msg-id	CACawEhXX7JcJVwmbQzd54ehxZTUF5th+XhAz04TPkLVQdT5m+Q@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

Hi,

As far as I can see, the following is the answer to the only remaining open discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides
>> which index to use the first time it opens the rel (or if the rel gets
>> invalidated) on subscriber and then for all consecutive operations it
>> uses the same index. It is quite possible that after some more
>> operations on the table, using the same index will actually be
>> costlier than a sequence scan or some other index scan
>
>
> Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.
>
> Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.

The call-stack for analyze is roughly:

do_analyze_rel()

-> vac_update_relstats()

-> heap_inplace_update()

-> if needs to apply any statistical change

-> CacheInvalidateHeapTuple()

And, we register for those invalidations already:

logicalrep_relmap_init() / logicalrep_partmap_init()

-> CacheRegisterRelcacheCallback()

Added a test which triggers this behavior. The test is as follows:

- Create two indexes on the target, on column_a and column_b

- Initially load data such that the column_a has a high cardinality

- Show that we use the index on column_a

- Load more data such that the column_b has higher cardinality

- ANALYZE on the target table

- Show that we use the index on column_b afterwards

Thanks,

Onder KALACI

Attachment

v4_0001_use_index_on_subs_when_pub_rep_ident_full.patch

pgsql-hackers by date:

From: Simon Riggs
Date: 01 August 2022, 19:00:41
Subject: Re: Dump/Restore of non-default PKs

From: Jacob Champion
Date: 01 August 2022, 19:30:39
Subject: Re: [Commitfest 2022-07] Patch Triage: Waiting on Author

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next