Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	October 11, 2022 15:44:06
Msg-id	CACawEhX0153mUsEAHxmgRDKZhq69Ygg5CGzasKgqJOTDgro4og@mail.gmail.com Whole thread Raw
In response to	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
List	pgsql-hackers

Tree view

Hi Kuroda Hayato,

===
01. relation.c - GetCheapestReplicaIdentityFullPath

```
* The reason that the planner would not pick partial indexes and indexes
* with only expressions based on the way currently baserestrictinfos are
* formed (e.g., col_1 = $1 ... AND col_N = $2).
```

Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1 ... AND attrN = $N".

Yes, it is a typo, fixed now.

===
02. 032_subscribe_use_index.pl

If a table has a primary key on the subscriber, it will be used even if enable_indexscan is false(legacy behavior).
Should we test it?

Yes, good idea. I added two tests, one test that we cannot use regular indexes when index scan is disabled, and another one that we use replica identity index when index scan is disabled. This is useful to make sure if someone changes the behavior can see the impact.

~~~
03. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

I think this test seems to be not trivial, so could you write down the motivation?

makes sense, done

~~~
04. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

```
# wait until the index is created
$node_subscriber->poll_query_until(
'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

CREATE INDEX is a synchronous behavior, right? If so we don't have to wait here.
...And the comment in case of die may be wrong.
(There are some cases like this)

It is not about CREATE INDEX being async. It is about pg_stat_all_indexes being async. If we do not wait, the tests become flaky, because sometimes the update has not been reflected in the view immediately.

This is explained here: PostgreSQL: Documentation: 14: 28.2. The Statistics Collector

When using the statistics to monitor collected data, it is important to realize that the information does not update instantaneously. Each individual server process transmits new statistical counts to the collector just before going idle; so a query or transaction still in progress does not affect the displayed totals. Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server). So the displayed information lags behind actual activity. However, current-query information collected by track_activities is always up-to-date.

~~~
05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
#
# Basic test where the subscriber uses index
# and touches 50 rows with UPDATE
```

"touches 50 rows with UPDATE" -> "updates 50 rows", per other tests.

fixed

~~~
06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

I think this test seems to be not trivial, so could you write down the motivation?
(Same as Re-calclate)

sure, done

~~~
07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-2";
```

I think we don't have to wait here, is() should be used instead.
poll_query_until() should be used only when idx_scan>0 is checked.
(There are some cases like this)

Yes, makes sense

~~~
08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

```
# make sure that the subscriber has the correct data
$node_subscriber->poll_query_until(
'postgres', q{select sum(user_id+value_1+value_2)=550070 AND count(DISTINCT(user_id,value_1, value_2))=981 from users_table_part;}
) or die "ensure subscriber has the correct data at the end of the test";
```

Ah, for this case, we already have is() checks for the same results, this is just a left-over from the earlier iterations

I think we can replace it to wait_for_catchup() and is()...
Moreover, we don't have to wait here because in above line we wait until the index is used on the subscriber.
(There are some cases like this)

Fixed a few more such cases.

Thanks for the review! Attached v16.

Onder KALACI

Attachment

v16_0001_use_index_on_subs_when_pub_rep_ident_full.patch

pgsql-hackers by date:

From: Laurenz Albe
Date: 11 October 2022, 15:37:25
Subject: Make EXPLAIN generate a generic plan for a parameterized query

From: hubert depesz lubaczewski
Date: 11 October 2022, 15:45:24
Subject: Re: Re: Is there any plan to support online schem change in postgresql?

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next