Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhXavM7kjzVgMfRSN+YyB_Lxqjoi8PJ2Gh7MdL-eScb9vw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Hi,

I'm a little late to catch up with your comments, but here are my replies:

> My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned table. If your question is something different, please let me know.
>

I was talking about inheritance cases, something like:
create table tbl1 (a int);
create table tbl1_part1 (b int) inherits (tbl1);
create table tbl1_part2 (c int) inherits (tbl1);

What we do in such cases is documented as: "if the table being
analyzed has inheritance children, ANALYZE gathers two sets of
statistics: one on the rows of the parent table only, and a second
including rows of both the parent table and all of its children. This
second set of statistics is needed when planning queries that process
the inheritance tree as a whole. The child tables themselves are not
individually analyzed in this case."

Oh, I haven't considered inherited tables. That seems right, the statistics of the children are not updated when the parent is analyzed. 
 
Now, the point I was worried about was what if the changes in child
tables (*_part1, *_part2) are much more than in tbl1? In such cases,
we may not invalidate child rel entries, so how will logical
replication behave for updates/deletes on child tables? There may not
be any problem here but it is better to do some analysis of such cases
to see how it behaves.

I also haven't observed any specific issues. In the end, when the user (or autovacuum) does ANALYZE on the child, it is when the statistics are updated for the child. Although I do not have much experience with inherited tables, this sounds like the expected behavior?

I also pushed a test covering inherited tables. First, a basic test on the parent. Then, show that updates on the parent can also use indexes of the children. Also, after an ANALYZE on the child, we can re-calculate the index and use the index with a higher cardinality column.


> Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality -- hence use index scan. So, bitmap index scans are probably not going to be that much common.
>

You are probably right here but I don't think we can make such
assumptions. I think the safest way to avoid any regression here is to
choose an index when the planner selects an index scan. We can always
extend it later to bitmap scans if required. We can add a comment
indicating the same.


Alright, I got rid of the bitmap scans. 

Though, it caused few of the new tests to fail. I think because of the data size/distribution, the planner picks bitmap scans. To make the tests consistent and small, I added `enable_bitmapscan to off` for this new test file. Does that sound ok to you? Or, should we change the tests to make sure they genuinely use index scans?

*
+ /*
+ * For insert-only workloads, calculating the index is not necessary.
+ * As the calculation is not expensive, we are fine to do here (instead
+ * of during first update/delete processing).
+ */

I think here instead of talking about cost, we should mention that it
is quite an infrequent operation i.e performed only when we first time
performs an operation on the relation or after invalidation. This is
because I think the cost is relative.

Changed, does that look better? 

+
+ /*
+ * Although currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions,

It may be better to expand this comment by describing a bit why it is
not possible in our case. You might want to give the function
reference where it is decided.

Make sense, added some more information.

Thanks,
Onder 
 
Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: sslinfo extension - add notbefore and notafter timestamps
Next
From: Erik Rijkers
Date:
Subject: Re: Schema variables - new implementation for Postgres 15