Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhXX7JcJVwmbQzd54ehxZTUF5th+XhAz04TPkLVQdT5m+Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Hi,

As far as I can see, the following is the answer to the only remaining open discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides
>> which index to use the first time it opens the rel (or if the rel gets
>> invalidated) on subscriber and then for all consecutive operations it
>> uses the same index. It is quite possible that after some more
>> operations on the table, using the same index will actually be
>> costlier than a sequence scan or some other index scan
>
>
> Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.
>
> Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.


It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
   -> vac_update_relstats()
     -> heap_inplace_update()
         -> if needs to apply any statistical change
             -> CacheInvalidateHeapTuple()

And, we register for those invalidations already:
logicalrep_relmap_init() / logicalrep_partmap_init()
   -> CacheRegisterRelcacheCallback()



Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a
- Load more data such that the column_b has higher cardinality
- ANALYZE on the target table
- Show that we use the index on column_b afterwards

Thanks,
Onder KALACI
 
Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Dump/Restore of non-default PKs
Next
From: Jacob Champion
Date:
Subject: Re: [Commitfest 2022-07] Patch Triage: Waiting on Author